Overview

Dataset statistics

Number of variables 31
Number of observations 31323
Missing cells 677801
Missing cells (%) 69.8%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 7.4 MiB
Average record size in memory 248.0 B

Variable types

Categorical 23
Numeric 7
Unsupported 1

Dataset

Description Dataset about students who were invited to pose and answer questions to each other via a chatbot application, supported by Telegram
Creator Matteo Busso, Massimo Stefan
Author Fausto Giunchiglia, Ivano Bison, Matteo Busso, Ronald Chenu-Abente, Marcelo Rodas Britez, Can Gunel, Giuseppe Veltri, Amalia de Götzen, Peter Kun, Amarsanaa Ganbold, Altangerel Chagnaa, George Gaskell, Miriam Bidoglia, Luca Cernuzzi, Alethia Hume, Jose Luis Zarza, Daniele Miorandi, Carlo Caprini
URL
Copyright (c) KnowDive 2022

Variable descriptions

university University where the experiment took place
id The task’s id
creationTs Creation timestamp of the task
lastUpdateTs Timestamp of last update of the task
taskTypeId The type of the task
requesterId The Id of the user making the task (asking a question)
appId The chatbot deployment
closeTs Closing timestamp of the task (if the task had an accepted answer)
communityId The Id of the users in the community
transactions.taskId The task’s id
transactions.label Label of action on task (answerTransaction, bestAnswerTransaction, CREATE_TASK, moreAnswerTransaction, notAnswerTransaction, reportAnswerTransaction, reportQuestionTransaction)
transactions.creationTs Creation timestamp of transaction
transactions.lastUpdateTs Last update timestamp of transaction
transactions.actioneerId User id of transaction action
transactions.id -
transactions.messages.appId App Id of the transaction message
transactions.messages.receiverId User Id of the transaction message
transactions.messages.label Label of transaction message (AnsweredPickedMessage, AnsweredQuestionMessage, QuestionToAnswerMessage)
transactions.messages.attributes.taskId taskId of the transaction message
transactions.messages.attributes.userId User Id of the person asking the question
transactions.messages.attributes.question Question text of the transaction
transactions.messages.attributes.transactionId Id of the transaction
transactions.messages.attributes.answer Answer on a question
transactions.attributes.answer Answer on a question
transactions.attributes.transactionId Id of transaction
transactions.attributes.reason Reason of accepting an answer
goal.name Question (without duplicating extended questions)
goal.description Empty column
attributes.kindOfAnswerer -
attributes.answeredDetails -
transactions.count.id Count of follow-up action on task (Higher the number, the more actions were done on task)

Alerts

taskTypeId has constant value "ask4help" Constant
id has a high cardinality: 1828 distinct values High cardinality
creationTs has a high cardinality: 1424 distinct values High cardinality
lastUpdateTs has a high cardinality: 1425 distinct values High cardinality
closeTs has a high cardinality: 917 distinct values High cardinality
transactions.taskId has a high cardinality: 1828 distinct values High cardinality
transactions.creationTs has a high cardinality: 9858 distinct values High cardinality
transactions.lastUpdateTs has a high cardinality: 9850 distinct values High cardinality
transactions.messages.receiverId has a high cardinality: 1509 distinct values High cardinality
transactions.messages.attributes.taskId has a high cardinality: 1828 distinct values High cardinality
transactions.messages.attributes.question has a high cardinality: 1807 distinct values High cardinality
transactions.messages.attributes.answer has a high cardinality: 6973 distinct values High cardinality
transactions.attributes.answer has a high cardinality: 7069 distinct values High cardinality
goal.name has a high cardinality: 1830 distinct values High cardinality
attributes.answeredDetails has a high cardinality: 1064 distinct values High cardinality
requesterId is highly correlated with transactions.actioneerId and 1 other fields High correlation
transactions.actioneerId is highly correlated with requesterId and 1 other fields High correlation
transactions.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.attributes.userId is highly correlated with requesterId and 1 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
requesterId is highly correlated with transactions.actioneerId and 1 other fields High correlation
transactions.actioneerId is highly correlated with requesterId and 1 other fields High correlation
transactions.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.attributes.userId is highly correlated with requesterId and 1 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
requesterId is highly correlated with transactions.actioneerId and 1 other fields High correlation
transactions.actioneerId is highly correlated with requesterId and 1 other fields High correlation
transactions.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.attributes.userId is highly correlated with requesterId and 1 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
university is highly correlated with requesterId and 6 other fields High correlation
requesterId is highly correlated with university and 5 other fields High correlation
appId is highly correlated with university and 5 other fields High correlation
communityId is highly correlated with university and 5 other fields High correlation
transactions.label is highly correlated with transactions.messages.label High correlation
transactions.actioneerId is highly correlated with university and 5 other fields High correlation
transactions.id is highly correlated with transactions.messages.label and 2 other fields High correlation
transactions.messages.appId is highly correlated with university and 5 other fields High correlation
transactions.messages.label is highly correlated with university and 4 other fields High correlation
transactions.messages.attributes.userId is highly correlated with university and 6 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.attributes.transactionId is highly correlated with transactions.id and 2 other fields High correlation
transactions.attributes.reason is highly correlated with transactions.count.id High correlation
transactions.count.id is highly correlated with transactions.messages.label and 3 other fields High correlation
id has 29495 (94.2%) missing values Missing
creationTs has 29495 (94.2%) missing values Missing
lastUpdateTs has 29495 (94.2%) missing values Missing
taskTypeId has 29495 (94.2%) missing values Missing
requesterId has 29495 (94.2%) missing values Missing
appId has 29495 (94.2%) missing values Missing
closeTs has 30125 (96.2%) missing values Missing
communityId has 29495 (94.2%) missing values Missing
transactions.taskId has 18281 (58.4%) missing values Missing
transactions.label has 18281 (58.4%) missing values Missing
transactions.creationTs has 18281 (58.4%) missing values Missing
transactions.lastUpdateTs has 18281 (58.4%) missing values Missing
transactions.actioneerId has 18281 (58.4%) missing values Missing
transactions.id has 28859 (92.1%) missing values Missing
transactions.messages.appId has 8869 (28.3%) missing values Missing
transactions.messages.receiverId has 9806 (31.3%) missing values Missing
transactions.messages.label has 6794 (21.7%) missing values Missing
transactions.messages.attributes.taskId has 8869 (28.3%) missing values Missing
transactions.messages.attributes.userId has 3595 (11.5%) missing values Missing
transactions.messages.attributes.question has 11237 (35.9%) missing values Missing
transactions.messages.attributes.transactionId has 22475 (71.8%) missing values Missing
transactions.messages.attributes.answer has 23666 (75.6%) missing values Missing
transactions.attributes.answer has 23677 (75.6%) missing values Missing
transactions.attributes.transactionId has 30119 (96.2%) missing values Missing
transactions.attributes.reason has 31316 (> 99.9%) missing values Missing
goal.name has 29466 (94.1%) missing values Missing
goal.description has 31323 (100.0%) missing values Missing
attributes.kindOfAnswerer has 29495 (94.2%) missing values Missing
attributes.answeredDetails has 29495 (94.2%) missing values Missing
transactions.count.id has 20745 (66.2%) missing values Missing
id is uniformly distributed Uniform
transactions.attributes.answer is uniformly distributed Uniform
goal.name is uniformly distributed Uniform
goal.description is an unsupported type, check if it needs cleaning or further analysis Unsupported
transactions.id has 372 (1.2%) zeros Zeros
transactions.count.id has 1456 (4.6%) zeros Zeros

Reproduction

Analysis started 2022-07-04 18:12:10.408468
Analysis finished 2022-07-04 18:12:38.670184
Duration 28.26 seconds
Software version pandas-profiling v3.2.0
Download configuration config.json

Variables

university
Categorical

HIGH CORRELATION

University where the experiment took place

Distinct 3
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 244.8 KiB
ENG
26491
AAU
3077
LSE
1755

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 93969
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ENG
2nd row ENG
3rd row ENG
4th row ENG
5th row ENG

Common Values

Value Count Frequency (%)
ENG 26491
84.6%
AAU 3077
9.8%
LSE 1755
5.6%

Length

2022-07-04T20:12:38.796860 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:39.031311 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
eng 26491
84.6%
aau 3077
9.8%
lse 1755
5.6%

Most occurring characters

Value Count Frequency (%)
E 28246
30.1%
N 26491
28.2%
G 26491
28.2%
A 6154
6.5%
U 3077
3.3%
L 1755
1.9%
S 1755
1.9%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 93969
100.0%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
E 28246
30.1%
N 26491
28.2%
G 26491
28.2%
A 6154
6.5%
U 3077
3.3%
L 1755
1.9%
S 1755
1.9%

Most occurring scripts

Value Count Frequency (%)
Latin 93969
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
E 28246
30.1%
N 26491
28.2%
G 26491
28.2%
A 6154
6.5%
U 3077
3.3%
L 1755
1.9%
S 1755
1.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 93969
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
E 28246
30.1%
N 26491
28.2%
G 26491
28.2%
A 6154
6.5%
U 3077
3.3%
L 1755
1.9%
S 1755
1.9%

id
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

The task’s id

Distinct 1828
Distinct (%) 100.0%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
605200e16d62bf159d1d7c8a
1
605200876d62bf159d1d7c88
1
605200306d62bf159d1d7c87
1
6051ff446d62bf159d1d7c86
1
6051ff3f6d62bf159d1d7c85
1
Other values (1823)
1823

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 43872
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1828 ?
Unique (%) 100.0%

Sample

1st row 60b9d9aa7061826a7a968f90
2nd row 60b9decb7061826a7a968f91
3rd row 60b9e4127061826a7a968f92
4th row 60b9e7247061826a7a968f93
5th row 60b9e7ea7061826a7a968f94

Common Values

Value Count Frequency (%)
605200e16d62bf159d1d7c8a 1
< 0.1%
605200876d62bf159d1d7c88 1
< 0.1%
605200306d62bf159d1d7c87 1
< 0.1%
6051ff446d62bf159d1d7c86 1
< 0.1%
6051ff3f6d62bf159d1d7c85 1
< 0.1%
6051fe956d62bf159d1d7c84 1
< 0.1%
6051fe636d62bf159d1d7c83 1
< 0.1%
6051f8076d62bf159d1d7c81 1
< 0.1%
6051f7c66d62bf159d1d7c80 1
< 0.1%
6051f4986d62bf159d1d7c7f 1
< 0.1%
Other values (1818) 1818
5.8%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:39.233947 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
605200e16d62bf159d1d7c8a 1
0.1%
60ba16d77061826a7a968faa 1
0.1%
60b9e4127061826a7a968f92 1
0.1%
60b9e7247061826a7a968f93 1
0.1%
60b9e7ea7061826a7a968f94 1
0.1%
60b9e8887061826a7a968f95 1
0.1%
60b9e9787061826a7a968f96 1
0.1%
60b9fd327061826a7a968f97 1
0.1%
60ba03507061826a7a968f98 1
0.1%
60ba04167061826a7a968f99 1
0.1%
Other values (1818) 1818
99.5%

Most occurring characters

Value Count Frequency (%)
6 6648
15.2%
d 5396
12.3%
1 4048
9.2%
5 3478
7.9%
0 3461
7.9%
9 2884
6.6%
7 2849
6.5%
b 2766
6.3%
f 2681
6.1%
2 2645
6.0%
Other values (6) 7016
16.0%

Most occurring categories

Value Count Frequency (%)
Decimal Number 29125
66.4%
Lowercase Letter 14747
33.6%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
6 6648
22.8%
1 4048
13.9%
5 3478
11.9%
0 3461
11.9%
9 2884
9.9%
7 2849
9.8%
2 2645
9.1%
8 1304
4.5%
4 986
3.4%
3 822
2.8%
Lowercase Letter
Value Count Frequency (%)
d 5396
36.6%
b 2766
18.8%
f 2681
18.2%
a 1718
11.6%
c 1201
8.1%
e 985
6.7%

Most occurring scripts

Value Count Frequency (%)
Common 29125
66.4%
Latin 14747
33.6%

Most frequent character per script

Common
Value Count Frequency (%)
6 6648
22.8%
1 4048
13.9%
5 3478
11.9%
0 3461
11.9%
9 2884
9.9%
7 2849
9.8%
2 2645
9.1%
8 1304
4.5%
4 986
3.4%
3 822
2.8%
Latin
Value Count Frequency (%)
d 5396
36.6%
b 2766
18.8%
f 2681
18.2%
a 1718
11.6%
c 1201
8.1%
e 985
6.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 43872
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
6 6648
15.2%
d 5396
12.3%
1 4048
9.2%
5 3478
7.9%
0 3461
7.9%
9 2884
6.6%
7 2849
6.5%
b 2766
6.3%
f 2681
6.1%
2 2645
6.0%
Other values (6) 7016
16.0%

creationTs
Categorical

HIGH CARDINALITY
MISSING

Creation timestamp of the task

Distinct 1424
Distinct (%) 77.9%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
1970-01-01 01:00:44
402
2021-03-19T20:11:57Z
2
2021-03-16 12:17:57
2
2021-03-25 22:33:58
2
2021-03-21 20:30:13
1
Other values (1419)
1419

Length

Max length 20
Median length 19
Mean length 17.69967177
Min length 12

Characters and Unicode

Total characters 32355
Distinct characters 16
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1420 ?
Unique (%) 77.7%

Sample

1st row 1622792619.0
2nd row 1622793931.0
3rd row 1622795282.0
4th row 1622796069.0
5th row 1622796267.0

Common Values

Value Count Frequency (%)
1970-01-01 01:00:44 402
1.3%
2021-03-19T20:11:57Z 2
< 0.1%
2021-03-16 12:17:57 2
< 0.1%
2021-03-25 22:33:58 2
< 0.1%
2021-03-21 20:30:13 1
< 0.1%
2021-03-22 08:39:14 1
< 0.1%
2021-03-22 08:06:37 1
< 0.1%
2021-03-22 00:22:47 1
< 0.1%
2021-03-21 23:03:19 1
< 0.1%
2021-03-21 23:02:58 1
< 0.1%
Other values (1414) 1414
4.5%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:39.469508 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
1970-01-01 402
13.2%
01:00:44 402
13.2%
2021-03-16 134
4.4%
2021-03-15 86
2.8%
2021-03-17 83
2.7%
2021-03-25 77
2.5%
2021-03-18 55
1.8%
2021-03-19 54
1.8%
2021-03-22 53
1.7%
2021-03-26 46
1.5%
Other values (1426) 1665
54.5%

Most occurring characters

Value Count Frequency (%)
0 6089
18.8%
1 5197
16.1%
2 4434
13.7%
- 2912
9.0%
: 2912
9.0%
3 2128
6.6%
4 1621
5.0%
1229
3.8%
9 1151
3.6%
6 1058
3.3%
Other values (6) 3624
11.2%

Most occurring categories

Value Count Frequency (%)
Decimal Number 24476
75.6%
Other Punctuation 3284
10.1%
Dash Punctuation 2912
9.0%
Space Separator 1229
3.8%
Uppercase Letter 454
1.4%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 6089
24.9%
1 5197
21.2%
2 4434
18.1%
3 2128
8.7%
4 1621
6.6%
9 1151
4.7%
6 1058
4.3%
7 1031
4.2%
5 1017
4.2%
8 750
3.1%
Other Punctuation
Value Count Frequency (%)
: 2912
88.7%
. 372
11.3%
Uppercase Letter
Value Count Frequency (%)
T 227
50.0%
Z 227
50.0%
Dash Punctuation
Value Count Frequency (%)
- 2912
100.0%
Space Separator
Value Count Frequency (%)
1229
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 31901
98.6%
Latin 454
1.4%

Most frequent character per script

Common
Value Count Frequency (%)
0 6089
19.1%
1 5197
16.3%
2 4434
13.9%
- 2912
9.1%
: 2912
9.1%
3 2128
6.7%
4 1621
5.1%
1229
3.9%
9 1151
3.6%
6 1058
3.3%
Other values (4) 3170
9.9%
Latin
Value Count Frequency (%)
T 227
50.0%
Z 227
50.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 32355
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 6089
18.8%
1 5197
16.1%
2 4434
13.7%
- 2912
9.0%
: 2912
9.0%
3 2128
6.6%
4 1621
5.0%
1229
3.8%
9 1151
3.6%
6 1058
3.3%
Other values (6) 3624
11.2%

lastUpdateTs
Categorical

HIGH CARDINALITY
MISSING

Timestamp of last update of the task

Distinct 1425
Distinct (%) 78.0%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
1970-01-01 01:00:44
402
2021-03-17 16:36:45
2
2021-03-19T19:40:41Z
2
2021-03-21 20:35:29
1
2021-03-26 00:43:38
1
Other values (1420)
1420

Length

Max length 20
Median length 19
Mean length 17.6963895
Min length 12

Characters and Unicode

Total characters 32349
Distinct characters 16
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1422 ?
Unique (%) 77.8%

Sample

1st row 1622792846.0
2nd row 1622805752.0
3rd row 1622795461.0
4th row 1622796146.0
5th row 1622796374.0

Common Values

Value Count Frequency (%)
1970-01-01 01:00:44 402
1.3%
2021-03-17 16:36:45 2
< 0.1%
2021-03-19T19:40:41Z 2
< 0.1%
2021-03-21 20:35:29 1
< 0.1%
2021-03-26 00:43:38 1
< 0.1%
2021-03-22 00:34:00 1
< 0.1%
2021-03-22 09:15:28 1
< 0.1%
2021-03-22 12:37:24 1
< 0.1%
2021-03-21 23:15:19 1
< 0.1%
2021-03-21 21:07:05 1
< 0.1%
Other values (1415) 1415
4.5%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:39.733630 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
1970-01-01 402
13.2%
01:00:44 402
13.2%
2021-03-16 106
3.5%
2021-03-17 91
3.0%
2021-03-15 75
2.5%
2021-03-28 72
2.4%
2021-03-26 67
2.2%
2021-03-25 61
2.0%
2021-03-18 53
1.7%
2021-03-20 50
1.6%
Other values (1432) 1678
54.9%

Most occurring characters

Value Count Frequency (%)
0 6085
18.8%
1 5050
15.6%
2 4490
13.9%
- 2912
9.0%
: 2910
9.0%
3 2182
6.7%
4 1683
5.2%
1229
3.8%
9 1162
3.6%
6 1041
3.2%
Other values (6) 3605
11.1%

Most occurring categories

Value Count Frequency (%)
Decimal Number 24472
75.6%
Other Punctuation 3282
10.1%
Dash Punctuation 2912
9.0%
Space Separator 1229
3.8%
Uppercase Letter 454
1.4%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 6085
24.9%
1 5050
20.6%
2 4490
18.3%
3 2182
8.9%
4 1683
6.9%
9 1162
4.7%
6 1041
4.3%
7 1033
4.2%
5 974
4.0%
8 772
3.2%
Other Punctuation
Value Count Frequency (%)
: 2910
88.7%
. 372
11.3%
Uppercase Letter
Value Count Frequency (%)
T 227
50.0%
Z 227
50.0%
Dash Punctuation
Value Count Frequency (%)
- 2912
100.0%
Space Separator
Value Count Frequency (%)
1229
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 31895
98.6%
Latin 454
1.4%

Most frequent character per script

Common
Value Count Frequency (%)
0 6085
19.1%
1 5050
15.8%
2 4490
14.1%
- 2912
9.1%
: 2910
9.1%
3 2182
6.8%
4 1683
5.3%
1229
3.9%
9 1162
3.6%
6 1041
3.3%
Other values (4) 3151
9.9%
Latin
Value Count Frequency (%)
T 227
50.0%
Z 227
50.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 32349
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 6085
18.8%
1 5050
15.6%
2 4490
13.9%
- 2912
9.0%
: 2910
9.0%
3 2182
6.7%
4 1683
5.2%
1229
3.8%
9 1162
3.6%
6 1041
3.2%
Other values (6) 3605
11.1%

taskTypeId
Categorical

CONSTANT
MISSING
REJECTED

The type of the task

Distinct 1
Distinct (%) 0.1%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
ask4help
1828

Length

Max length 8
Median length 8
Mean length 8
Min length 8

Characters and Unicode

Total characters 14624
Distinct characters 8
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ask4help
2nd row ask4help
3rd row ask4help
4th row ask4help
5th row ask4help

Common Values

Value Count Frequency (%)
ask4help 1828
5.8%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:39.970515 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:40.197727 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
ask4help 1828
100.0%

Most occurring characters

Value Count Frequency (%)
a 1828
12.5%
s 1828
12.5%
k 1828
12.5%
4 1828
12.5%
h 1828
12.5%
e 1828
12.5%
l 1828
12.5%
p 1828
12.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 12796
87.5%
Decimal Number 1828
12.5%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 1828
14.3%
s 1828
14.3%
k 1828
14.3%
h 1828
14.3%
e 1828
14.3%
l 1828
14.3%
p 1828
14.3%
Decimal Number
Value Count Frequency (%)
4 1828
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 12796
87.5%
Common 1828
12.5%

Most frequent character per script

Latin
Value Count Frequency (%)
a 1828
14.3%
s 1828
14.3%
k 1828
14.3%
h 1828
14.3%
e 1828
14.3%
l 1828
14.3%
p 1828
14.3%
Common
Value Count Frequency (%)
4 1828
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 14624
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
a 1828
12.5%
s 1828
12.5%
k 1828
12.5%
4 1828
12.5%
h 1828
12.5%
e 1828
12.5%
l 1828
12.5%
p 1828
12.5%

requesterId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

The Id of the user making the task (asking a question)

Distinct 175
Distinct (%) 9.6%
Missing 29495
Missing (%) 94.2%
Infinite 0
Infinite (%) 0.0%
Mean 160.7221007
Minimum 5
Maximum 289
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:40.422187 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 5
5-th percentile 49
Q1 108
median 149
Q3 214
95-th percentile 279
Maximum 289
Range 284
Interquartile range (IQR) 106

Descriptive statistics

Standard deviation 68.09998134
Coefficient of variation (CV) 0.4237126137
Kurtosis -0.8449861253
Mean 160.7221007
Median Absolute Deviation (MAD) 55
Skewness 0.1500354859
Sum 293800
Variance 4637.607459
Monotonicity Not monotonic
2022-07-04T20:12:40.704706 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
94 72
0.2%
127 58
0.2%
279 47
0.2%
247 41
0.1%
49 40
0.1%
92 37
0.1%
135 37
0.1%
258 36
0.1%
142 33
0.1%
107 32
0.1%
Other values (165) 1395
4.5%
(Missing) 29495
94.2%
Value Count Frequency (%)
5 4
< 0.1%
8 2
< 0.1%
10 3
< 0.1%
20 3
< 0.1%
22 1
< 0.1%
24 3
< 0.1%
37 19
0.1%
40 6
< 0.1%
41 15
< 0.1%
44 3
< 0.1%
Value Count Frequency (%)
289 6
< 0.1%
288 6
< 0.1%
287 4
< 0.1%
284 5
< 0.1%
283 22
0.1%
282 14
< 0.1%
280 11
< 0.1%
279 47
0.2%
277 4
< 0.1%
276 3
< 0.1%

appId
Categorical

HIGH CORRELATION
MISSING

The chatbot deployment

Distinct 5
Distinct (%) 0.3%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
GnYi1gZEcv
570
2kUw54aeVP
402
dLAIbwQczK
372
9tF0K1T7Rr
257
jFLFXPUDz4
227

Length

Max length 10
Median length 10
Mean length 10
Min length 10

Characters and Unicode

Total characters 18280
Distinct characters 39
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row dLAIbwQczK
2nd row dLAIbwQczK
3rd row dLAIbwQczK
4th row dLAIbwQczK
5th row dLAIbwQczK

Common Values

Value Count Frequency (%)
GnYi1gZEcv 570
1.8%
2kUw54aeVP 402
1.3%
dLAIbwQczK 372
1.2%
9tF0K1T7Rr 257
0.8%
jFLFXPUDz4 227
0.7%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:40.968179 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:41.234626 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
gnyi1gzecv 570
31.2%
2kuw54aevp 402
22.0%
dlaibwqczk 372
20.4%
9tf0k1t7rr 257
14.1%
jflfxpudz4 227
12.4%

Most occurring characters

Value Count Frequency (%)
c 942
5.2%
1 827
4.5%
w 774
4.2%
F 711
3.9%
P 629
3.4%
U 629
3.4%
K 629
3.4%
4 629
3.4%
z 599
3.3%
L 599
3.3%
Other values (29) 11312
61.9%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 7963
43.6%
Lowercase Letter 7286
39.9%
Decimal Number 3031
16.6%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
F 711
8.9%
P 629
7.9%
U 629
7.9%
K 629
7.9%
L 599
7.5%
G 570
7.2%
E 570
7.2%
Y 570
7.2%
Z 570
7.2%
V 402
5.0%
Other values (7) 2084
26.2%
Lowercase Letter
Value Count Frequency (%)
c 942
12.9%
w 774
10.6%
z 599
8.2%
n 570
7.8%
v 570
7.8%
i 570
7.8%
g 570
7.8%
k 402
5.5%
a 402
5.5%
e 402
5.5%
Other values (5) 1485
20.4%
Decimal Number
Value Count Frequency (%)
1 827
27.3%
4 629
20.8%
2 402
13.3%
5 402
13.3%
9 257
8.5%
0 257
8.5%
7 257
8.5%

Most occurring scripts

Value Count Frequency (%)
Latin 15249
83.4%
Common 3031
16.6%

Most frequent character per script

Latin
Value Count Frequency (%)
c 942
6.2%
w 774
5.1%
F 711
4.7%
P 629
4.1%
U 629
4.1%
K 629
4.1%
z 599
3.9%
L 599
3.9%
n 570
3.7%
G 570
3.7%
Other values (22) 8597
56.4%
Common
Value Count Frequency (%)
1 827
27.3%
4 629
20.8%
2 402
13.3%
5 402
13.3%
9 257
8.5%
0 257
8.5%
7 257
8.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 18280
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
c 942
5.2%
1 827
4.5%
w 774
4.2%
F 711
3.9%
P 629
3.4%
U 629
3.4%
K 629
3.4%
4 629
3.4%
z 599
3.3%
L 599
3.3%
Other values (29) 11312
61.9%

closeTs
Categorical

HIGH CARDINALITY
MISSING

Closing timestamp of the task (if the task had an accepted answer)

Distinct 917
Distinct (%) 76.5%
Missing 30125
Missing (%) 96.2%
Memory size 244.8 KiB
1970-01-01 01:00:44
280
2021-03-19T20:02:51Z
2
2021-03-16 19:41:01
2
2021-03-22 23:46:25
1
2021-03-23 17:48:37
1
Other values (912)
912

Length

Max length 20
Median length 19
Mean length 17.86560935
Min length 12

Characters and Unicode

Total characters 21403
Distinct characters 16
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 914 ?
Unique (%) 76.3%

Sample

1st row 1622792846.0
2nd row 1622805752.0
3rd row 1622795461.0
4th row 1622796146.0
5th row 1622796374.0

Common Values

Value Count Frequency (%)
1970-01-01 01:00:44 280
0.9%
2021-03-19T20:02:51Z 2
< 0.1%
2021-03-16 19:41:01 2
< 0.1%
2021-03-22 23:46:25 1
< 0.1%
2021-03-23 17:48:37 1
< 0.1%
2021-03-23 14:03:20 1
< 0.1%
2021-03-23 11:00:02 1
< 0.1%
2021-03-23 13:11:57 1
< 0.1%
2021-03-22 20:42:14 1
< 0.1%
2021-03-22 20:57:20 1
< 0.1%
Other values (907) 907
2.9%
(Missing) 30125
96.2%

Length

2022-07-04T20:12:41.501635 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
1970-01-01 280
14.1%
01:00:44 280
14.1%
2021-03-16 83
4.2%
2021-03-15 67
3.4%
2021-03-17 56
2.8%
2021-03-25 40
2.0%
2021-03-22 34
1.7%
2021-03-19 32
1.6%
2021-03-28 29
1.5%
2021-03-26 28
1.4%
Other values (925) 1062
53.3%

Most occurring characters

Value Count Frequency (%)
0 4004
18.7%
1 3439
16.1%
2 2909
13.6%
- 1958
9.1%
: 1954
9.1%
3 1404
6.6%
4 1138
5.3%
793
3.7%
9 788
3.7%
7 675
3.2%
Other values (6) 2341
10.9%

Most occurring categories

Value Count Frequency (%)
Decimal Number 16107
75.3%
Other Punctuation 2173
10.2%
Dash Punctuation 1958
9.1%
Space Separator 793
3.7%
Uppercase Letter 372
1.7%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 4004
24.9%
1 3439
21.4%
2 2909
18.1%
3 1404
8.7%
4 1138
7.1%
9 788
4.9%
7 675
4.2%
6 648
4.0%
5 613
3.8%
8 489
3.0%
Other Punctuation
Value Count Frequency (%)
: 1954
89.9%
. 219
10.1%
Uppercase Letter
Value Count Frequency (%)
T 186
50.0%
Z 186
50.0%
Dash Punctuation
Value Count Frequency (%)
- 1958
100.0%
Space Separator
Value Count Frequency (%)
793
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 21031
98.3%
Latin 372
1.7%

Most frequent character per script

Common
Value Count Frequency (%)
0 4004
19.0%
1 3439
16.4%
2 2909
13.8%
- 1958
9.3%
: 1954
9.3%
3 1404
6.7%
4 1138
5.4%
793
3.8%
9 788
3.7%
7 675
3.2%
Other values (4) 1969
9.4%
Latin
Value Count Frequency (%)
T 186
50.0%
Z 186
50.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 21403
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 4004
18.7%
1 3439
16.1%
2 2909
13.6%
- 1958
9.1%
: 1954
9.1%
3 1404
6.6%
4 1138
5.3%
793
3.7%
9 788
3.7%
7 675
3.2%
Other values (6) 2341
10.9%

communityId
Categorical

HIGH CORRELATION
MISSING

The Id of the users in the community

Distinct 5
Distinct (%) 0.3%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
602ed55417d9ef4ce4606c8e
570
604a84d318e39d441649e96c
402
602ed55317d9ef4ce4606c8d
372
602ed55217d9ef4ce4606c8b
257
602ed55417d9ef4ce4606c8f
227

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 43872
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 602ed55317d9ef4ce4606c8d
2nd row 602ed55317d9ef4ce4606c8d
3rd row 602ed55317d9ef4ce4606c8d
4th row 602ed55317d9ef4ce4606c8d
5th row 602ed55317d9ef4ce4606c8d

Common Values

Value Count Frequency (%)
602ed55417d9ef4ce4606c8e 570
1.8%
604a84d318e39d441649e96c 402
1.3%
602ed55317d9ef4ce4606c8d 372
1.2%
602ed55217d9ef4ce4606c8b 257
0.8%
602ed55417d9ef4ce4606c8f 227
0.7%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:41.741452 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:42.002219 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
602ed55417d9ef4ce4606c8e 570
31.2%
604a84d318e39d441649e96c 402
22.0%
602ed55317d9ef4ce4606c8d 372
20.4%
602ed55217d9ef4ce4606c8b 257
14.1%
602ed55417d9ef4ce4606c8f 227
12.4%

Most occurring characters

Value Count Frequency (%)
4 5659
12.9%
e 5652
12.9%
6 5484
12.5%
d 4028
9.2%
0 3254
7.4%
c 3254
7.4%
5 2852
6.5%
9 2632
6.0%
1 2230
5.1%
8 2230
5.1%
Other values (6) 6597
15.0%

Most occurring categories

Value Count Frequency (%)
Decimal Number 28626
65.2%
Lowercase Letter 15246
34.8%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
4 5659
19.8%
6 5484
19.2%
0 3254
11.4%
5 2852
10.0%
9 2632
9.2%
1 2230
7.8%
8 2230
7.8%
2 1683
5.9%
7 1426
5.0%
3 1176
4.1%
Lowercase Letter
Value Count Frequency (%)
e 5652
37.1%
d 4028
26.4%
c 3254
21.3%
f 1653
10.8%
a 402
2.6%
b 257
1.7%

Most occurring scripts

Value Count Frequency (%)
Common 28626
65.2%
Latin 15246
34.8%

Most frequent character per script

Common
Value Count Frequency (%)
4 5659
19.8%
6 5484
19.2%
0 3254
11.4%
5 2852
10.0%
9 2632
9.2%
1 2230
7.8%
8 2230
7.8%
2 1683
5.9%
7 1426
5.0%
3 1176
4.1%
Latin
Value Count Frequency (%)
e 5652
37.1%
d 4028
26.4%
c 3254
21.3%
f 1653
10.8%
a 402
2.6%
b 257
1.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 43872
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
4 5659
12.9%
e 5652
12.9%
6 5484
12.5%
d 4028
9.2%
0 3254
7.4%
c 3254
7.4%
5 2852
6.5%
9 2632
6.0%
1 2230
5.1%
8 2230
5.1%
Other values (6) 6597
15.0%

transactions.taskId
Categorical

HIGH CARDINALITY
MISSING

The task’s id

Distinct 1828
Distinct (%) 14.0%
Missing 18281
Missing (%) 58.4%
Memory size 244.8 KiB
6051c4496d62bf159d1d7c55
25
60bfdcbb7061826a7a969070
24
60c1dbb37061826a7a96908c
23
604f47376d62bf159d1d7afc
21
605ddea96d62bf159d1d7fe0
19
Other values (1823)
12930

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 313008
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 2 ?
Unique (%) < 0.1%

Sample

1st row 60b9d9aa7061826a7a968f90
2nd row 60b9d9aa7061826a7a968f90
3rd row 60b9d9aa7061826a7a968f90
4th row 60b9d9aa7061826a7a968f90
5th row 60b9d9aa7061826a7a968f90

Common Values

Value Count Frequency (%)
6051c4496d62bf159d1d7c55 25
0.1%
60bfdcbb7061826a7a969070 24
0.1%
60c1dbb37061826a7a96908c 23
0.1%
604f47376d62bf159d1d7afc 21
0.1%
605ddea96d62bf159d1d7fe0 19
0.1%
6051bfbe6d62bf159d1d7c4b 19
0.1%
605254d96d62bf159d1d7ca5 19
0.1%
605d480c6d62bf159d1d7fb1 18
0.1%
604f7eb26d62bf159d1d7b41 18
0.1%
6051abfc6d62bf159d1d7c35 18
0.1%
Other values (1818) 12838
41.0%
(Missing) 18281
58.4%

Length

2022-07-04T20:12:42.266296 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
6051c4496d62bf159d1d7c55 25
0.2%
60bfdcbb7061826a7a969070 24
0.2%
60c1dbb37061826a7a96908c 23
0.2%
604f47376d62bf159d1d7afc 21
0.2%
605ddea96d62bf159d1d7fe0 19
0.1%
6051bfbe6d62bf159d1d7c4b 19
0.1%
605254d96d62bf159d1d7ca5 19
0.1%
604f52c76d62bf159d1d7b0b 18
0.1%
605470fb6d62bf159d1d7d56 18
0.1%
60507ff66d62bf159d1d7b9c 18
0.1%
Other values (1818) 12838
98.4%

Most occurring characters

Value Count Frequency (%)
6 47309
15.1%
d 38762
12.4%
1 29206
9.3%
5 25102
8.0%
0 24490
7.8%
9 20524
6.6%
7 20438
6.5%
b 20098
6.4%
f 19379
6.2%
2 18583
5.9%
Other values (6) 49117
15.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 207745
66.4%
Lowercase Letter 105263
33.6%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
6 47309
22.8%
1 29206
14.1%
5 25102
12.1%
0 24490
11.8%
9 20524
9.9%
7 20438
9.8%
2 18583
8.9%
8 9161
4.4%
4 7158
3.4%
3 5774
2.8%
Lowercase Letter
Value Count Frequency (%)
d 38762
36.8%
b 20098
19.1%
f 19379
18.4%
a 11604
11.0%
c 8580
8.2%
e 6840
6.5%

Most occurring scripts

Value Count Frequency (%)
Common 207745
66.4%
Latin 105263
33.6%

Most frequent character per script

Common
Value Count Frequency (%)
6 47309
22.8%
1 29206
14.1%
5 25102
12.1%
0 24490
11.8%
9 20524
9.9%
7 20438
9.8%
2 18583
8.9%
8 9161
4.4%
4 7158
3.4%
3 5774
2.8%
Latin
Value Count Frequency (%)
d 38762
36.8%
b 20098
19.1%
f 19379
18.4%
a 11604
11.0%
c 8580
8.2%
e 6840
6.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 313008
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
6 47309
15.1%
d 38762
12.4%
1 29206
9.3%
5 25102
8.0%
0 24490
7.8%
9 20524
6.6%
7 20438
6.5%
b 20098
6.4%
f 19379
6.2%
2 18583
5.9%
Other values (6) 49117
15.7%

transactions.label
Categorical

HIGH CORRELATION
MISSING

Label of action on task (answerTransaction, bestAnswerTransaction, CREATE_TASK, moreAnswerTransaction, notAnswerTransaction, reportAnswerTransaction, reportQuestionTransaction)

Distinct 7
Distinct (%) 0.1%
Missing 18281
Missing (%) 58.4%
Memory size 244.8 KiB
answerTransaction
7646
notAnswerTransaction
2015
CREATE_TASK
1828
bestAnswerTransaction
1203
moreAnswerTransaction
343
Other values (2)
7

Length

Max length 25
Median length 17
Mean length 17.10082809
Min length 11

Characters and Unicode

Total characters 223029
Distinct characters 23
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1 ?
Unique (%) < 0.1%

Sample

1st row CREATE_TASK
2nd row notAnswerTransaction
3rd row notAnswerTransaction
4th row notAnswerTransaction
5th row notAnswerTransaction

Common Values

Value Count Frequency (%)
answerTransaction 7646
24.4%
notAnswerTransaction 2015
6.4%
CREATE_TASK 1828
5.8%
bestAnswerTransaction 1203
3.8%
moreAnswerTransaction 343
1.1%
reportQuestionTransaction 6
< 0.1%
reportAnswerTransaction 1
< 0.1%
(Missing) 18281
58.4%

Length

2022-07-04T20:12:42.505597 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:42.790837 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
answertransaction 7646
58.6%
notanswertransaction 2015
15.5%
create_task 1828
14.0%
bestanswertransaction 1203
9.2%
moreanswertransaction 343
2.6%
reportquestiontransaction 6
< 0.1%
reportanswertransaction 1
< 0.1%

Most occurring characters

Value Count Frequency (%)
n 35657
16.0%
a 30074
13.5%
s 23631
10.6%
r 22779
10.2%
T 14870
6.7%
t 14445
6.5%
o 13585
6.1%
e 12767
5.7%
i 11220
5.0%
c 11214
5.0%
Other values (13) 32787
14.7%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 188139
84.4%
Uppercase Letter 33062
14.8%
Connector Punctuation 1828
0.8%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
n 35657
19.0%
a 30074
16.0%
s 23631
12.6%
r 22779
12.1%
t 14445
7.7%
o 13585
7.2%
e 12767
6.8%
i 11220
6.0%
c 11214
6.0%
w 11208
6.0%
Other values (4) 1559
0.8%
Uppercase Letter
Value Count Frequency (%)
T 14870
45.0%
A 7218
21.8%
E 3656
11.1%
C 1828
5.5%
R 1828
5.5%
S 1828
5.5%
K 1828
5.5%
Q 6
< 0.1%
Connector Punctuation
Value Count Frequency (%)
_ 1828
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 221201
99.2%
Common 1828
0.8%

Most frequent character per script

Latin
Value Count Frequency (%)
n 35657
16.1%
a 30074
13.6%
s 23631
10.7%
r 22779
10.3%
T 14870
6.7%
t 14445
6.5%
o 13585
6.1%
e 12767
5.8%
i 11220
5.1%
c 11214
5.1%
Other values (12) 30959
14.0%
Common
Value Count Frequency (%)
_ 1828
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 223029
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
n 35657
16.0%
a 30074
13.5%
s 23631
10.6%
r 22779
10.2%
T 14870
6.7%
t 14445
6.5%
o 13585
6.1%
e 12767
5.7%
i 11220
5.0%
c 11214
5.0%
Other values (13) 32787
14.7%

transactions.creationTs
Categorical

HIGH CARDINALITY
MISSING

Creation timestamp of transaction

Distinct 9858
Distinct (%) 75.6%
Missing 18281
Missing (%) 58.4%
Memory size 244.8 KiB
1970-01-01 01:00:44
3077
1622806950.0
3
2021-03-23T15:40:14Z
2
2021-03-19T20:02:34Z
2
2021-03-17 16:38:33
2
Other values (9853)
9956

Length

Max length 20
Median length 19
Mean length 17.76989726
Min length 12

Characters and Unicode

Total characters 231755
Distinct characters 16
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 9750 ?
Unique (%) 74.8%

Sample

1st row 1622792619.0
2nd row 1622792660.0
3rd row 1622792677.0
4th row 1622792678.0
5th row 1622792697.0

Common Values

Value Count Frequency (%)
1970-01-01 01:00:44 3077
9.8%
1622806950.0 3
< 0.1%
2021-03-23T15:40:14Z 2
< 0.1%
2021-03-19T20:02:34Z 2
< 0.1%
2021-03-17 16:38:33 2
< 0.1%
2021-03-19T20:05:38Z 2
< 0.1%
2021-03-19T18:41:46Z 2
< 0.1%
2021-03-15 22:28:32 2
< 0.1%
2021-03-19T18:39:48Z 2
< 0.1%
2021-03-16 12:17:57 2
< 0.1%
Other values (9848) 9946
31.8%
(Missing) 18281
58.4%

Length

2022-07-04T20:12:43.090804 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
1970-01-01 3077
13.8%
01:00:44 3077
13.8%
2021-03-16 1017
4.5%
2021-03-15 651
2.9%
2021-03-17 650
2.9%
2021-03-25 551
2.5%
2021-03-26 427
1.9%
2021-03-20 388
1.7%
2021-03-28 376
1.7%
2021-03-22 370
1.7%
Other values (9594) 11783
52.7%

Most occurring characters

Value Count Frequency (%)
0 44378
19.1%
1 37126
16.0%
2 31364
13.5%
- 21156
9.1%
: 21140
9.1%
3 15247
6.6%
4 11963
5.2%
9325
4.0%
9 7981
3.4%
7 7573
3.3%
Other values (6) 24502
10.6%

Most occurring categories

Value Count Frequency (%)
Decimal Number 175164
75.6%
Other Punctuation 23604
10.2%
Dash Punctuation 21156
9.1%
Space Separator 9325
4.0%
Uppercase Letter 2506
1.1%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 44378
25.3%
1 37126
21.2%
2 31364
17.9%
3 15247
8.7%
4 11963
6.8%
9 7981
4.6%
7 7573
4.3%
6 7310
4.2%
5 6985
4.0%
8 5237
3.0%
Other Punctuation
Value Count Frequency (%)
: 21140
89.6%
. 2464
10.4%
Uppercase Letter
Value Count Frequency (%)
T 1253
50.0%
Z 1253
50.0%
Dash Punctuation
Value Count Frequency (%)
- 21156
100.0%
Space Separator
Value Count Frequency (%)
9325
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 229249
98.9%
Latin 2506
1.1%

Most frequent character per script

Common
Value Count Frequency (%)
0 44378
19.4%
1 37126
16.2%
2 31364
13.7%
- 21156
9.2%
: 21140
9.2%
3 15247
6.7%
4 11963
5.2%
9325
4.1%
9 7981
3.5%
7 7573
3.3%
Other values (4) 21996
9.6%
Latin
Value Count Frequency (%)
T 1253
50.0%
Z 1253
50.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 231755
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 44378
19.1%
1 37126
16.0%
2 31364
13.5%
- 21156
9.1%
: 21140
9.1%
3 15247
6.6%
4 11963
5.2%
9325
4.0%
9 7981
3.4%
7 7573
3.3%
Other values (6) 24502
10.6%

transactions.lastUpdateTs
Categorical

HIGH CARDINALITY
MISSING

Last update timestamp of transaction

Distinct 9850
Distinct (%) 75.5%
Missing 18281
Missing (%) 58.4%
Memory size 244.8 KiB
1970-01-01 01:00:44
3077
2021-03-25 15:51:32
3
2021-03-23T15:40:14Z
2
2021-03-25T14:35:14Z
2
2021-03-23 23:05:47
2
Other values (9845)
9956

Length

Max length 20
Median length 19
Mean length 17.76989726
Min length 12

Characters and Unicode

Total characters 231755
Distinct characters 16
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 9734 ?
Unique (%) 74.6%

Sample

1st row 1622792619.0
2nd row 1622792660.0
3rd row 1622792677.0
4th row 1622792678.0
5th row 1622792697.0

Common Values

Value Count Frequency (%)
1970-01-01 01:00:44 3077
9.8%
2021-03-25 15:51:32 3
< 0.1%
2021-03-23T15:40:14Z 2
< 0.1%
2021-03-25T14:35:14Z 2
< 0.1%
2021-03-23 23:05:47 2
< 0.1%
2021-03-25 22:21:17 2
< 0.1%
1622878299.0 2
< 0.1%
2021-03-19T19:58:06Z 2
< 0.1%
2021-03-25 22:33:59 2
< 0.1%
2021-03-16 17:48:14 2
< 0.1%
Other values (9840) 9946
31.8%
(Missing) 18281
58.4%

Length

2022-07-04T20:12:43.339029 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
1970-01-01 3077
13.8%
01:00:44 3077
13.8%
2021-03-16 1017
4.5%
2021-03-15 651
2.9%
2021-03-17 650
2.9%
2021-03-25 551
2.5%
2021-03-26 427
1.9%
2021-03-20 388
1.7%
2021-03-28 376
1.7%
2021-03-22 370
1.7%
Other values (9574) 11783
52.7%

Most occurring characters

Value Count Frequency (%)
0 44363
19.1%
1 37097
16.0%
2 31425
13.6%
- 21156
9.1%
: 21140
9.1%
3 15235
6.6%
4 11923
5.1%
9325
4.0%
9 7992
3.4%
7 7526
3.2%
Other values (6) 24573
10.6%

Most occurring categories

Value Count Frequency (%)
Decimal Number 175164
75.6%
Other Punctuation 23604
10.2%
Dash Punctuation 21156
9.1%
Space Separator 9325
4.0%
Uppercase Letter 2506
1.1%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 44363
25.3%
1 37097
21.2%
2 31425
17.9%
3 15235
8.7%
4 11923
6.8%
9 7992
4.6%
7 7526
4.3%
6 7346
4.2%
5 7013
4.0%
8 5244
3.0%
Other Punctuation
Value Count Frequency (%)
: 21140
89.6%
. 2464
10.4%
Uppercase Letter
Value Count Frequency (%)
T 1253
50.0%
Z 1253
50.0%
Dash Punctuation
Value Count Frequency (%)
- 21156
100.0%
Space Separator
Value Count Frequency (%)
9325
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 229249
98.9%
Latin 2506
1.1%

Most frequent character per script

Common
Value Count Frequency (%)
0 44363
19.4%
1 37097
16.2%
2 31425
13.7%
- 21156
9.2%
: 21140
9.2%
3 15235
6.6%
4 11923
5.2%
9325
4.1%
9 7992
3.5%
7 7526
3.3%
Other values (4) 22067
9.6%
Latin
Value Count Frequency (%)
T 1253
50.0%
Z 1253
50.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 231755
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 44363
19.1%
1 37097
16.0%
2 31425
13.6%
- 21156
9.1%
: 21140
9.1%
3 15235
6.6%
4 11923
5.1%
9325
4.0%
9 7992
3.4%
7 7526
3.2%
Other values (6) 24573
10.6%

transactions.actioneerId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

User id of transaction action

Distinct 193
Distinct (%) 1.5%
Missing 18281
Missing (%) 58.4%
Infinite 0
Infinite (%) 0.0%
Mean 157.9104432
Minimum 5
Maximum 292
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:43.614423 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 5
5-th percentile 56
Q1 107
median 147
Q3 204
95-th percentile 272
Maximum 292
Range 287
Interquartile range (IQR) 97

Descriptive statistics

Standard deviation 65.30421775
Coefficient of variation (CV) 0.4135522416
Kurtosis -0.7715273085
Mean 157.9104432
Median Absolute Deviation (MAD) 44
Skewness 0.2863779979
Sum 2059468
Variance 4264.640856
Monotonicity Not monotonic
2022-07-04T20:12:43.913320 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
94 417
1.3%
162 298
1.0%
262 279
0.9%
84 258
0.8%
135 257
0.8%
136 250
0.8%
129 245
0.8%
151 241
0.8%
248 237
0.8%
127 223
0.7%
Other values (183) 10337
33.0%
(Missing) 18281
58.4%
Value Count Frequency (%)
5 4
< 0.1%
8 6
< 0.1%
10 11
< 0.1%
14 1
< 0.1%
20 12
< 0.1%
22 20
0.1%
24 14
< 0.1%
37 117
0.4%
40 30
0.1%
41 69
0.2%
Value Count Frequency (%)
292 6
< 0.1%
291 1
< 0.1%
289 27
0.1%
288 56
0.2%
287 23
0.1%
286 7
< 0.1%
284 25
0.1%
283 66
0.2%
282 62
0.2%
280 109
0.3%

transactions.id
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

-

Distinct 24
Distinct (%) 1.0%
Missing 28859
Missing (%) 92.1%
Infinite 0
Infinite (%) 0.0%
Mean 3.591314935
Minimum 0
Maximum 23
Zeros 372
Zeros (%) 1.2%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:44.194183 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
median 3
Q3 5
95-th percentile 10
Maximum 23
Range 23
Interquartile range (IQR) 4

Descriptive statistics

Standard deviation 3.27365089
Coefficient of variation (CV) 0.9115465921
Kurtosis 3.610662485
Mean 3.591314935
Median Absolute Deviation (MAD) 2
Skewness 1.529122635
Sum 8849
Variance 10.71679015
Monotonicity Not monotonic
2022-07-04T20:12:44.663355 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
Value Count Frequency (%)
0 372
1.2%
1 372
1.2%
2 370
1.2%
3 329
1.1%
4 262
0.8%
5 199
0.6%
6 152
0.5%
7 126
0.4%
8 95
0.3%
9 55
0.2%
Other values (14) 132
0.4%
(Missing) 28859
92.1%
Value Count Frequency (%)
0 372
1.2%
1 372
1.2%
2 370
1.2%
3 329
1.1%
4 262
0.8%
5 199
0.6%
6 152
0.5%
7 126
0.4%
8 95
0.3%
9 55
0.2%
Value Count Frequency (%)
23 1
< 0.1%
22 2
< 0.1%
21 2
< 0.1%
20 2
< 0.1%
19 2
< 0.1%
18 2
< 0.1%
17 3
< 0.1%
16 5
< 0.1%
15 7
< 0.1%
14 8
< 0.1%

transactions.messages.appId
Categorical

HIGH CORRELATION
MISSING

App Id of the transaction message

Distinct 5
Distinct (%) < 0.1%
Missing 8869
Missing (%) 28.3%
Memory size 244.8 KiB
dLAIbwQczK
20061
2kUw54aeVP
855
GnYi1gZEcv
710
9tF0K1T7Rr
493
jFLFXPUDz4
335

Length

Max length 10
Median length 10
Mean length 10
Min length 10

Characters and Unicode

Total characters 224540
Distinct characters 39
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row dLAIbwQczK
2nd row dLAIbwQczK
3rd row dLAIbwQczK
4th row dLAIbwQczK
5th row dLAIbwQczK

Common Values

Value Count Frequency (%)
dLAIbwQczK 20061
64.0%
2kUw54aeVP 855
2.7%
GnYi1gZEcv 710
2.3%
9tF0K1T7Rr 493
1.6%
jFLFXPUDz4 335
1.1%
(Missing) 8869
28.3%

Length

2022-07-04T20:12:44.882746 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:45.130969 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
dlaibwqczk 20061
89.3%
2kuw54aevp 855
3.8%
gnyi1gzecv 710
3.2%
9tf0k1t7rr 493
2.2%
jflfxpudz4 335
1.5%

Most occurring characters

Value Count Frequency (%)
w 20916
9.3%
c 20771
9.3%
K 20554
9.2%
L 20396
9.1%
z 20396
9.1%
d 20061
8.9%
A 20061
8.9%
I 20061
8.9%
b 20061
8.9%
Q 20061
8.9%
Other values (29) 21202
9.4%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 110027
49.0%
Lowercase Letter 108931
48.5%
Decimal Number 5582
2.5%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
K 20554
18.7%
L 20396
18.5%
A 20061
18.2%
I 20061
18.2%
Q 20061
18.2%
P 1190
1.1%
U 1190
1.1%
F 1163
1.1%
V 855
0.8%
Z 710
0.6%
Other values (7) 3786
3.4%
Lowercase Letter
Value Count Frequency (%)
w 20916
19.2%
c 20771
19.1%
z 20396
18.7%
d 20061
18.4%
b 20061
18.4%
k 855
0.8%
a 855
0.8%
e 855
0.8%
v 710
0.7%
g 710
0.7%
Other values (5) 2741
2.5%
Decimal Number
Value Count Frequency (%)
1 1203
21.6%
4 1190
21.3%
5 855
15.3%
2 855
15.3%
9 493
8.8%
0 493
8.8%
7 493
8.8%

Most occurring scripts

Value Count Frequency (%)
Latin 218958
97.5%
Common 5582
2.5%

Most frequent character per script

Latin
Value Count Frequency (%)
w 20916
9.6%
c 20771
9.5%
K 20554
9.4%
L 20396
9.3%
z 20396
9.3%
d 20061
9.2%
A 20061
9.2%
I 20061
9.2%
b 20061
9.2%
Q 20061
9.2%
Other values (22) 15620
7.1%
Common
Value Count Frequency (%)
1 1203
21.6%
4 1190
21.3%
5 855
15.3%
2 855
15.3%
9 493
8.8%
0 493
8.8%
7 493
8.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 224540
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
w 20916
9.3%
c 20771
9.3%
K 20554
9.2%
L 20396
9.1%
z 20396
9.1%
d 20061
8.9%
A 20061
8.9%
I 20061
8.9%
b 20061
8.9%
Q 20061
8.9%
Other values (29) 21202
9.4%

transactions.messages.receiverId
Categorical

HIGH CARDINALITY
MISSING

User Id of the transaction message

Distinct 1509
Distinct (%) 7.0%
Missing 9806
Missing (%) 31.3%
Memory size 244.8 KiB
279.0
481
283.0
465
282.0
450
258.0
433
241.0
422
Other values (1504)
19266

Length

Max length 236
Median length 5
Mean length 15.38011805
Min length 3

Characters and Unicode

Total characters 330934
Distinct characters 12
Distinct categories 3 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1456 ?
Unique (%) 6.8%

Sample

1st row 1.0
2nd row 5.0
3rd row 16.0
4th row 44.0
5th row 237.0

Common Values

Value Count Frequency (%)
279.0 481
1.5%
283.0 465
1.5%
282.0 450
1.4%
258.0 433
1.4%
241.0 422
1.3%
247.0 420
1.3%
260.0 414
1.3%
262.0 413
1.3%
270.0 407
1.3%
245.0 403
1.3%
Other values (1499) 17209
54.9%
(Missing) 9806
31.3%

Length

2022-07-04T20:12:45.434173 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
279.0 481
2.2%
283.0 465
2.2%
282.0 450
2.1%
258.0 433
2.0%
241.0 422
2.0%
247.0 420
2.0%
260.0 414
1.9%
262.0 413
1.9%
270.0 407
1.9%
245.0 403
1.9%
Other values (1499) 17209
80.0%

Most occurring characters

Value Count Frequency (%)
| 63074
19.1%
1 54605
16.5%
2 39066
11.8%
0 35286
10.7%
. 20061
6.1%
4 19328
5.8%
6 19177
5.8%
5 19133
5.8%
8 17772
5.4%
7 17473
5.3%
Other values (2) 25959
7.8%

Most occurring categories

Value Count Frequency (%)
Decimal Number 247799
74.9%
Math Symbol 63074
19.1%
Other Punctuation 20061
6.1%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 54605
22.0%
2 39066
15.8%
0 35286
14.2%
4 19328
7.8%
6 19177
7.7%
5 19133
7.7%
8 17772
7.2%
7 17473
7.1%
9 13183
5.3%
3 12776
5.2%
Math Symbol
Value Count Frequency (%)
| 63074
100.0%
Other Punctuation
Value Count Frequency (%)
. 20061
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 330934
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
| 63074
19.1%
1 54605
16.5%
2 39066
11.8%
0 35286
10.7%
. 20061
6.1%
4 19328
5.8%
6 19177
5.8%
5 19133
5.8%
8 17772
5.4%
7 17473
5.3%
Other values (2) 25959
7.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 330934
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
| 63074
19.1%
1 54605
16.5%
2 39066
11.8%
0 35286
10.7%
. 20061
6.1%
4 19328
5.8%
6 19177
5.8%
5 19133
5.8%
8 17772
5.4%
7 17473
5.3%
Other values (2) 25959
7.8%

transactions.messages.label
Categorical

HIGH CORRELATION
MISSING

Label of transaction message (AnsweredPickedMessage, AnsweredQuestionMessage, QuestionToAnswerMessage)

Distinct 3
Distinct (%) < 0.1%
Missing 6794
Missing (%) 21.7%
Memory size 244.8 KiB
QuestionToAnswerMessage
20086
AnsweredQuestionMessage
3245
AnsweredPickedMessage
1198

Length

Max length 23
Median length 23
Mean length 22.9023197
Min length 21

Characters and Unicode

Total characters 561771
Distinct characters 19
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row QuestionToAnswerMessage
2nd row QuestionToAnswerMessage
3rd row QuestionToAnswerMessage
4th row QuestionToAnswerMessage
5th row QuestionToAnswerMessage

Common Values

Value Count Frequency (%)
QuestionToAnswerMessage 20086
64.1%
AnsweredQuestionMessage 3245
10.4%
AnsweredPickedMessage 1198
3.8%
(Missing) 6794
21.7%

Length

2022-07-04T20:12:45.716151 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:45.968915 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
questiontoanswermessage 20086
81.9%
answeredquestionmessage 3245
13.2%
answeredpickedmessage 1198
4.9%

Most occurring characters

Value Count Frequency (%)
e 102559
18.3%
s 96918
17.3%
n 47860
8.5%
o 43417
7.7%
A 24529
4.4%
r 24529
4.4%
g 24529
4.4%
i 24529
4.4%
a 24529
4.4%
M 24529
4.4%
Other values (9) 123843
22.0%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 468098
83.3%
Uppercase Letter 93673
16.7%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 102559
21.9%
s 96918
20.7%
n 47860
10.2%
o 43417
9.3%
r 24529
5.2%
g 24529
5.2%
i 24529
5.2%
a 24529
5.2%
w 24529
5.2%
u 23331
5.0%
Other values (4) 31368
6.7%
Uppercase Letter
Value Count Frequency (%)
A 24529
26.2%
M 24529
26.2%
Q 23331
24.9%
T 20086
21.4%
P 1198
1.3%

Most occurring scripts

Value Count Frequency (%)
Latin 561771
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 102559
18.3%
s 96918
17.3%
n 47860
8.5%
o 43417
7.7%
A 24529
4.4%
r 24529
4.4%
g 24529
4.4%
i 24529
4.4%
a 24529
4.4%
M 24529
4.4%
Other values (9) 123843
22.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 561771
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 102559
18.3%
s 96918
17.3%
n 47860
8.5%
o 43417
7.7%
A 24529
4.4%
r 24529
4.4%
g 24529
4.4%
i 24529
4.4%
a 24529
4.4%
M 24529
4.4%
Other values (9) 123843
22.0%

transactions.messages.attributes.taskId
Categorical

HIGH CARDINALITY
MISSING

taskId of the transaction message

Distinct 1828
Distinct (%) 8.1%
Missing 8869
Missing (%) 28.3%
Memory size 244.8 KiB
60c1dbb37061826a7a96908c
74
60ca18607061826a7a9690db
66
60cb710e7061826a7a9690ed
65
60c9cba97061826a7a9690d6
64
60be1b547061826a7a969053
64
Other values (1823)
22121

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 538896
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 832 ?
Unique (%) 3.7%

Sample

1st row 60b9d9aa7061826a7a968f90
2nd row 60b9d9aa7061826a7a968f90
3rd row 60b9d9aa7061826a7a968f90
4th row 60b9d9aa7061826a7a968f90
5th row 60b9d9aa7061826a7a968f90

Common Values

Value Count Frequency (%)
60c1dbb37061826a7a96908c 74
0.2%
60ca18607061826a7a9690db 66
0.2%
60cb710e7061826a7a9690ed 65
0.2%
60c9cba97061826a7a9690d6 64
0.2%
60be1b547061826a7a969053 64
0.2%
60c0eedd7061826a7a96907e 64
0.2%
60bb27b97061826a7a969013 63
0.2%
60bccfa77061826a7a96903b 63
0.2%
60c5ba597061826a7a9690b3 63
0.2%
60bfdcbb7061826a7a969070 63
0.2%
Other values (1818) 21805
69.6%
(Missing) 8869
28.3%

Length

2022-07-04T20:12:46.181481 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
60c1dbb37061826a7a96908c 74
0.3%
60ca18607061826a7a9690db 66
0.3%
60cb710e7061826a7a9690ed 65
0.3%
60c9cba97061826a7a9690d6 64
0.3%
60be1b547061826a7a969053 64
0.3%
60c0eedd7061826a7a96907e 64
0.3%
60bb27b97061826a7a969013 63
0.3%
60bccfa77061826a7a96903b 63
0.3%
60c5ba597061826a7a9690b3 63
0.3%
60bfdcbb7061826a7a969070 63
0.3%
Other values (1818) 21805
97.1%

Most occurring characters

Value Count Frequency (%)
6 96470
17.9%
0 66819
12.4%
a 55800
10.4%
7 51956
9.6%
9 46801
8.7%
8 34310
6.4%
1 33384
6.2%
2 31230
5.8%
b 26199
4.9%
c 19202
3.6%
Other values (6) 76725
14.2%

Most occurring categories

Value Count Frequency (%)
Decimal Number 393666
73.1%
Lowercase Letter 145230
26.9%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
6 96470
24.5%
0 66819
17.0%
7 51956
13.2%
9 46801
11.9%
8 34310
8.7%
1 33384
8.5%
2 31230
7.9%
5 12738
3.2%
4 10496
2.7%
3 9462
2.4%
Lowercase Letter
Value Count Frequency (%)
a 55800
38.4%
b 26199
18.0%
c 19202
13.2%
f 17398
12.0%
d 16446
11.3%
e 10185
7.0%

Most occurring scripts

Value Count Frequency (%)
Common 393666
73.1%
Latin 145230
26.9%

Most frequent character per script

Common
Value Count Frequency (%)
6 96470
24.5%
0 66819
17.0%
7 51956
13.2%
9 46801
11.9%
8 34310
8.7%
1 33384
8.5%
2 31230
7.9%
5 12738
3.2%
4 10496
2.7%
3 9462
2.4%
Latin
Value Count Frequency (%)
a 55800
38.4%
b 26199
18.0%
c 19202
13.2%
f 17398
12.0%
d 16446
11.3%
e 10185
7.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 538896
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
6 96470
17.9%
0 66819
12.4%
a 55800
10.4%
7 51956
9.6%
9 46801
8.7%
8 34310
6.4%
1 33384
6.2%
2 31230
5.8%
b 26199
4.9%
c 19202
3.6%
Other values (6) 76725
14.2%

transactions.messages.attributes.userId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

User Id of the person asking the question

Distinct 192
Distinct (%) 0.7%
Missing 3595
Missing (%) 11.5%
Infinite 0
Infinite (%) 0.0%
Mean 223.662363
Minimum 5
Maximum 292
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:46.430244 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 5
5-th percentile 85
Q1 181
median 253
Q3 270
95-th percentile 283
Maximum 292
Range 287
Interquartile range (IQR) 89

Descriptive statistics

Standard deviation 67.05887734
Coefficient of variation (CV) 0.2998219122
Kurtosis 0.5423181075
Mean 223.662363
Median Absolute Deviation (MAD) 26
Skewness -1.277074774
Sum 6201710
Variance 4496.89303
Monotonicity Not monotonic
2022-07-04T20:12:46.721024 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
279 2403
7.7%
247 2181
7.0%
258 1851
5.9%
283 1125
3.6%
260 804
2.6%
282 761
2.4%
261 740
2.4%
241 664
2.1%
245 626
2.0%
280 593
1.9%
Other values (182) 15980
51.0%
(Missing) 3595
11.5%
Value Count Frequency (%)
5 200
0.6%
8 5
< 0.1%
10 9
< 0.1%
14 1
< 0.1%
20 7
< 0.1%
22 17
0.1%
24 10
< 0.1%
37 74
0.2%
40 16
0.1%
41 42
0.1%
Value Count Frequency (%)
292 4
< 0.1%
289 315
1.0%
288 329
1.1%
287 220
0.7%
286 2
< 0.1%
284 256
0.8%
283 1125
3.6%
282 761
2.4%
280 593
1.9%
279 2403
7.7%

transactions.messages.attributes.question
Categorical

HIGH CARDINALITY
MISSING

Question text of the transaction

Distinct 1807
Distinct (%) 9.0%
Missing 11237
Missing (%) 35.9%
Memory size 244.8 KiB
Cosa fate oggi?
103
Cosa fate di solito dopo aver sostenuto un esame?
102
Quando avete il prossimo esame?
100
Spesso la gente è alla ricerca di idee per preparare novi piatti. Avete consigli e suggerimenti?
52
Chi segue breaking Italy? 😍
52
Other values (1802)
19677

Length

Max length 254
Median length 185
Mean length 79.58861894
Min length 1

Characters and Unicode

Total characters 1598617
Distinct characters 353
Distinct categories 19 ?
Distinct scripts 4 ?
Distinct blocks 10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1422 ?
Unique (%) 7.1%

Sample

1st row When is Data Science graduation dates of the A. Y. 2021/2021?
2nd row When is Data Science graduation dates of the A. Y. 2021/2021?
3rd row When is Data Science graduation dates of the A. Y. 2021/2021?
4th row When is Data Science graduation dates of the A. Y. 2021/2021?
5th row When is Data Science graduation dates of the A. Y. 2021/2021?

Common Values

Value Count Frequency (%)
Cosa fate oggi? 103
0.3%
Cosa fate di solito dopo aver sostenuto un esame? 102
0.3%
Quando avete il prossimo esame? 100
0.3%
Spesso la gente è alla ricerca di idee per preparare novi piatti. Avete consigli e suggerimenti? 52
0.2%
Chi segue breaking Italy? 😍 52
0.2%
Grazie a chi mi ha risposto. Sì, ho scritto a quella mail. Comunque aspetterò qualche giorno ancora ;) 52
0.2%
Ma voi siete più riusciti a contattare chi ha creato questo esperimento? Perché io ho mandato un paio di mail, ma non ho avuto risposta...quindi volevo capire se magari hanno avuto dei problemi e mi basta aspettare ancora un po' 52
0.2%
Avete qualche libro da consigliare per il fine settimana? 52
0.2%
Faccio la pizza fatta in casa, quale farcitura mi suggerireste? 52
0.2%
Qualcuno è di STPC? Dovrei preparare Metodi Quantitativi e Psicometria ma non so proprio da dove partire. Consigli? 52
0.2%
Other values (1797) 19417
62.0%
(Missing) 11237
35.9%

Length

2022-07-04T20:12:47.056024 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
di 7431
2.7%
a 6989
2.6%
che 5365
2.0%
il 4540
1.7%
per 4278
1.6%
e 3850
1.4%
non 3760
1.4%
è 3279
1.2%
in 3204
1.2%
la 2857
1.0%
Other values (6450) 228045
83.4%

Most occurring characters

Value Count Frequency (%)
252435
15.8%
e 140601
8.8%
a 130236
8.1%
o 124421
7.8%
i 119912
7.5%
n 83331
5.2%
t 80935
5.1%
r 73439
4.6%
s 68629
4.3%
l 54264
3.4%
Other values (343) 470414
29.4%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 1232987
77.1%
Space Separator 252435
15.8%
Other Punctuation 47391
3.0%
Uppercase Letter 44155
2.8%
Decimal Number 7947
0.5%
Other Symbol 6558
0.4%
Control 2044
0.1%
Dash Punctuation 1760
0.1%
Close Punctuation 1350
0.1%
Open Punctuation 1166
0.1%
Other values (9) 824
0.1%

Most frequent character per category

Other Symbol
Value Count Frequency (%)
😂 1317
20.1%
🔝 508
7.7%
🙈 464
7.1%
😅 457
7.0%
😍 356
5.4%
🤔 343
5.2%
😉 212
3.2%
🤣 162
2.5%
🙃 154
2.3%
😬 153
2.3%
Other values (165) 2432
37.1%
Lowercase Letter
Value Count Frequency (%)
e 140601
11.4%
a 130236
10.6%
o 124421
10.1%
i 119912
9.7%
n 83331
6.8%
t 80935
6.6%
r 73439
6.0%
s 68629
5.6%
l 54264
4.4%
c 50799
4.1%
Other values (63) 306420
24.9%
Uppercase Letter
Value Count Frequency (%)
I 6128
13.9%
C 4967
11.2%
D 3853
8.7%
S 3679
8.3%
Q 3156
7.1%
A 2832
6.4%
T 2503
5.7%
M 2335
5.3%
P 1620
3.7%
R 1428
3.2%
Other values (47) 11654
26.4%
Other Punctuation
Value Count Frequency (%)
? 17034
35.9%
. 11835
25.0%
, 6294
13.3%
' 4551
9.6%
! 1965
4.1%
" 1793
3.8%
/ 1541
3.3%
: 1177
2.5%
@ 1031
2.2%
¿ 77
0.2%
Other values (6) 93
0.2%
Decimal Number
Value Count Frequency (%)
1 2618
32.9%
2 1075
13.5%
8 736
9.3%
4 722
9.1%
5 722
9.1%
0 705
8.9%
3 577
7.3%
7 317
4.0%
6 259
3.3%
9 216
2.7%
Math Symbol
Value Count Frequency (%)
+ 98
94.2%
> 3
2.9%
1
1.0%
= 1
1.0%
~ 1
1.0%
Close Punctuation
Value Count Frequency (%)
) 1317
97.6%
] 33
2.4%
Open Punctuation
Value Count Frequency (%)
( 1133
97.2%
[ 33
2.8%
Final Punctuation
Value Count Frequency (%)
234
79.6%
60
20.4%
Modifier Symbol
Value Count Frequency (%)
🏼 52
86.7%
🏻 8
13.3%
Space Separator
Value Count Frequency (%)
252435
100.0%
Control
Value Count Frequency (%)
2044
100.0%
Dash Punctuation
Value Count Frequency (%)
- 1760
100.0%
Nonspacing Mark
Value Count Frequency (%)
180
100.0%
Format
Value Count Frequency (%)
122
100.0%
Initial Punctuation
Value Count Frequency (%)
58
100.0%
Other Number
Value Count Frequency (%)
² 3
100.0%
Currency Symbol
Value Count Frequency (%)
£ 2
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 1
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 1245944
77.9%
Common 321173
20.1%
Cyrillic 31198
2.0%
Inherited 302
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
252435
78.6%
? 17034
5.3%
. 11835
3.7%
, 6294
2.0%
' 4551
1.4%
1 2618
0.8%
2044
0.6%
! 1965
0.6%
" 1793
0.6%
- 1760
0.5%
Other values (211) 18844
5.9%
Latin
Value Count Frequency (%)
e 140601
11.3%
a 130236
10.5%
o 124421
10.0%
i 119912
9.6%
n 83331
6.7%
t 80935
6.5%
r 73439
5.9%
s 68629
5.5%
l 54264
4.4%
c 50799
4.1%
Other values (57) 319377
25.6%
Cyrillic
Value Count Frequency (%)
а 4063
13.0%
э 2648
8.5%
н 2097
6.7%
г 1772
5.7%
л 1622
5.2%
д 1511
4.8%
х 1472
4.7%
р 1451
4.7%
у 1421
4.6%
й 1376
4.4%
Other values (53) 11765
37.7%
Inherited
Value Count Frequency (%)
180
59.6%
122
40.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 1549398
96.9%
Cyrillic 31198
2.0%
None 13097
0.8%
Emoticons 3856
0.2%
Punctuation 474
< 0.1%
Misc Symbols 244
< 0.1%
VS 180
< 0.1%
Dingbats 165
< 0.1%
Enclosed Alphanum Sup 4
< 0.1%
Geometric Shapes 1
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
252435
16.3%
e 140601
9.1%
a 130236
8.4%
o 124421
8.0%
i 119912
7.7%
n 83331
5.4%
t 80935
5.2%
r 73439
4.7%
s 68629
4.4%
l 54264
3.5%
Other values (78) 421195
27.2%
None
Value Count Frequency (%)
è 4139
31.6%
à 2288
17.5%
ù 1159
8.8%
ò 1011
7.7%
ì 914
7.0%
é 631
4.8%
🔝 508
3.9%
🤔 343
2.6%
È 192
1.5%
🤣 162
1.2%
Other values (129) 1750
13.4%
Cyrillic
Value Count Frequency (%)
а 4063
13.0%
э 2648
8.5%
н 2097
6.7%
г 1772
5.7%
л 1622
5.2%
д 1511
4.8%
х 1472
4.7%
р 1451
4.7%
у 1421
4.6%
й 1376
4.4%
Other values (53) 11765
37.7%
Emoticons
Value Count Frequency (%)
😂 1317
34.2%
🙈 464
12.0%
😅 457
11.9%
😍 356
9.2%
😉 212
5.5%
🙃 154
4.0%
😬 153
4.0%
😊 115
3.0%
😰 102
2.6%
😱 101
2.6%
Other values (26) 425
11.0%
Punctuation
Value Count Frequency (%)
234
49.4%
122
25.7%
60
12.7%
58
12.2%
VS
Value Count Frequency (%)
180
100.0%
Misc Symbols
Value Count Frequency (%)
114
46.7%
54
22.1%
52
21.3%
7
2.9%
5
2.0%
5
2.0%
3
1.2%
1
0.4%
1
0.4%
1
0.4%
Dingbats
Value Count Frequency (%)
103
62.4%
52
31.5%
7
4.2%
1
0.6%
1
0.6%
1
0.6%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇸 1
25.0%
🇨 1
25.0%
🇳 1
25.0%
🇲 1
25.0%
Geometric Shapes
Value Count Frequency (%)
1
100.0%

transactions.messages.attributes.transactionId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Id of the transaction

Distinct 23
Distinct (%) 0.3%
Missing 22475
Missing (%) 71.8%
Infinite 0
Infinite (%) 0.0%
Mean 4.325610307
Minimum 1
Maximum 23
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:47.350059 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
median 3
Q3 6
95-th percentile 11
Maximum 23
Range 22
Interquartile range (IQR) 4

Descriptive statistics

Standard deviation 3.313929906
Coefficient of variation (CV) 0.7661184598
Kurtosis 1.456095694
Mean 4.325610307
Median Absolute Deviation (MAD) 2
Skewness 1.248522519
Sum 38273
Variance 10.98213142
Monotonicity Not monotonic
2022-07-04T20:12:47.573045 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
Value Count Frequency (%)
1 1910
6.1%
2 1492
4.8%
3 1143
3.6%
4 943
3.0%
5 767
2.4%
6 639
2.0%
7 483
1.5%
8 389
1.2%
9 316
1.0%
10 217
0.7%
Other values (13) 549
1.8%
(Missing) 22475
71.8%
Value Count Frequency (%)
1 1910
6.1%
2 1492
4.8%
3 1143
3.6%
4 943
3.0%
5 767
2.4%
6 639
2.0%
7 483
1.5%
8 389
1.2%
9 316
1.0%
10 217
0.7%
Value Count Frequency (%)
23 2
< 0.1%
22 2
< 0.1%
21 1
< 0.1%
20 2
< 0.1%
19 3
< 0.1%
18 3
< 0.1%
17 16
0.1%
16 20
0.1%
15 34
0.1%
14 58
0.2%

transactions.messages.attributes.answer
Categorical

HIGH CARDINALITY
MISSING

Answer on a question

Distinct 6973
Distinct (%) 91.1%
Missing 23666
Missing (%) 75.6%
Memory size 244.8 KiB
No
73
Yes
72
Thank you
16
[user143]
14
2
13
Other values (6968)
7469

Length

Max length 819
Median length 405
Mean length 50.68577772
Min length 1

Characters and Unicode

Total characters 388101
Distinct characters 348
Distinct categories 21 ?
Distinct scripts 4 ?
Distinct blocks 11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 6720 ?
Unique (%) 87.8%

Sample

1st row nobody knows 🤷‍♂️
2nd row Any thought
3rd row Good question😂
4th row Industrial Engineering per 2 year - ID11
5th row European and international studies, tu?

Common Values

Value Count Frequency (%)
No 73
0.2%
Yes 72
0.2%
Thank you 16
0.1%
[user143] 14
< 0.1%
2 13
< 0.1%
[user135] 12
< 0.1%
Good night 11
< 0.1%
[user166] 10
< 0.1%
yes 10
< 0.1%
Ok 10
< 0.1%
Other values (6963) 7416
23.7%
(Missing) 23666
75.6%

Length

2022-07-04T20:12:47.883985 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
the 2689
3.6%
i 2666
3.6%
to 2320
3.1%
a 1934
2.6%
and 1752
2.4%
it 1356
1.8%
of 1136
1.5%
you 1074
1.4%
in 982
1.3%
is 968
1.3%
Other values (9220) 57327
77.3%

Most occurring characters

Value Count Frequency (%)
66594
17.2%
e 34572
8.9%
t 27712
7.1%
o 25215
6.5%
a 23922
6.2%
n 21189
5.5%
i 20069
5.2%
s 18548
4.8%
r 15506
4.0%
h 13600
3.5%
Other values (338) 121174
31.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 291456
75.1%
Space Separator 66594
17.2%
Uppercase Letter 14172
3.7%
Other Punctuation 10466
2.7%
Decimal Number 2161
0.6%
Other Symbol 1452
0.4%
Close Punctuation 491
0.1%
Dash Punctuation 443
0.1%
Open Punctuation 328
0.1%
Final Punctuation 237
0.1%
Other values (11) 301
0.1%

Most frequent character per category

Other Symbol
Value Count Frequency (%)
😂 278
19.1%
😅 100
6.9%
🤣 56
3.9%
😊 56
3.9%
47
3.2%
😍 47
3.2%
27
1.9%
😁 26
1.8%
🌸 25
1.7%
🔝 22
1.5%
Other values (209) 768
52.9%
Lowercase Letter
Value Count Frequency (%)
e 34572
11.9%
t 27712
9.5%
o 25215
8.7%
a 23922
8.2%
n 21189
7.3%
i 20069
6.9%
s 18548
6.4%
r 15506
5.3%
h 13600
4.7%
l 12040
4.1%
Other values (30) 79083
27.1%
Uppercase Letter
Value Count Frequency (%)
I 4222
29.8%
T 1130
8.0%
A 889
6.3%
S 744
5.2%
N 644
4.5%
H 591
4.2%
B 586
4.1%
M 515
3.6%
Y 492
3.5%
W 469
3.3%
Other values (18) 3890
27.4%
Other Punctuation
Value Count Frequency (%)
. 3439
32.9%
, 3133
29.9%
' 2209
21.1%
! 539
5.2%
: 317
3.0%
? 304
2.9%
" 181
1.7%
/ 154
1.5%
& 55
0.5%
; 50
0.5%
Other values (9) 85
0.8%
Decimal Number
Value Count Frequency (%)
1 462
21.4%
0 416
19.3%
2 407
18.8%
3 197
9.1%
5 171
7.9%
4 143
6.6%
7 110
5.1%
8 92
4.3%
6 88
4.1%
9 75
3.5%
Modifier Symbol
Value Count Frequency (%)
🏻 29
74.4%
🏼 4
10.3%
^ 2
5.1%
🏾 2
5.1%
🏽 2
5.1%
Math Symbol
Value Count Frequency (%)
= 26
51.0%
+ 14
27.5%
> 6
11.8%
~ 3
5.9%
< 2
3.9%
Dash Punctuation
Value Count Frequency (%)
- 428
96.6%
14
3.2%
1
0.2%
Currency Symbol
Value Count Frequency (%)
6
50.0%
£ 5
41.7%
1
8.3%
Close Punctuation
Value Count Frequency (%)
) 387
78.8%
] 104
21.2%
Open Punctuation
Value Count Frequency (%)
( 224
68.3%
[ 104
31.7%
Final Punctuation
Value Count Frequency (%)
216
91.1%
21
8.9%
Initial Punctuation
Value Count Frequency (%)
21
75.0%
7
25.0%
Space Separator
Value Count Frequency (%)
66594
100.0%
Nonspacing Mark
Value Count Frequency (%)
108
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 29
100.0%
Format
Value Count Frequency (%)
17
100.0%
Control
Value Count Frequency (%)
8
100.0%
Other Number
Value Count Frequency (%)
² 6
100.0%
Other Letter
Value Count Frequency (%)
2
100.0%
Enclosing Mark
Value Count Frequency (%)
1
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 305628
78.7%
Common 82345
21.2%
Inherited 126
< 0.1%
Hangul 2
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
66594
80.9%
. 3439
4.2%
, 3133
3.8%
' 2209
2.7%
! 539
0.7%
1 462
0.6%
- 428
0.5%
0 416
0.5%
2 407
0.5%
) 387
0.5%
Other values (266) 4331
5.3%
Latin
Value Count Frequency (%)
e 34572
11.3%
t 27712
9.1%
o 25215
8.3%
a 23922
7.8%
n 21189
6.9%
i 20069
6.6%
s 18548
6.1%
r 15506
5.1%
h 13600
4.4%
l 12040
3.9%
Other values (58) 93255
30.5%
Inherited
Value Count Frequency (%)
108
85.7%
17
13.5%
1
0.8%
Hangul
Value Count Frequency (%)
2
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 386083
99.5%
Emoticons 803
0.2%
None 613
0.2%
Punctuation 304
0.1%
VS 108
< 0.1%
Misc Symbols 84
< 0.1%
Dingbats 65
< 0.1%
Enclosed Alphanum Sup 32
< 0.1%
Currency Symbols 6
< 0.1%
Jamo 2
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
66594
17.2%
e 34572
9.0%
t 27712
7.2%
o 25215
6.5%
a 23922
6.2%
n 21189
5.5%
i 20069
5.2%
s 18548
4.8%
r 15506
4.0%
h 13600
3.5%
Other values (81) 119156
30.9%
Emoticons
Value Count Frequency (%)
😂 278
34.6%
😅 100
12.5%
😊 56
7.0%
😍 47
5.9%
😁 26
3.2%
😭 21
2.6%
🙂 20
2.5%
😆 20
2.5%
🙈 18
2.2%
🙃 18
2.2%
Other values (40) 199
24.8%
Punctuation
Value Count Frequency (%)
216
71.1%
21
6.9%
21
6.9%
17
5.6%
14
4.6%
7
2.3%
7
2.3%
1
0.3%
VS
Value Count Frequency (%)
108
100.0%
None
Value Count Frequency (%)
🤣 56
9.1%
ø 34
5.5%
🏻 29
4.7%
🌸 25
4.1%
🔝 22
3.6%
🤔 21
3.4%
👌 18
2.9%
🤗 17
2.8%
🤩 16
2.6%
💪 15
2.4%
Other values (150) 360
58.7%
Dingbats
Value Count Frequency (%)
47
72.3%
10
15.4%
3
4.6%
2
3.1%
2
3.1%
1
1.5%
Misc Symbols
Value Count Frequency (%)
27
32.1%
22
26.2%
9
10.7%
6
7.1%
4
4.8%
3
3.6%
2
2.4%
2
2.4%
2
2.4%
1
1.2%
Other values (6) 6
7.1%
Currency Symbols
Value Count Frequency (%)
6
100.0%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇩 5
15.6%
🇪 4
12.5%
🇰 4
12.5%
🇬 3
9.4%
🇹 3
9.4%
🇮 3
9.4%
🇧 2
6.2%
🇳 2
6.2%
🇲 2
6.2%
🇵 1
3.1%
Other values (3) 3
9.4%
Jamo
Value Count Frequency (%)
2
100.0%
Enclosed Ideographic Sup
Value Count Frequency (%)
🉐 1
100.0%

transactions.attributes.answer
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Answer on a question

Distinct 7069
Distinct (%) 92.5%
Missing 23677
Missing (%) 75.6%
Memory size 244.8 KiB
No
43
Yes
22
Тийм
18
Баярлалаа
15
[user143]
14
Other values (7064)
7534

Length

Max length 806
Median length 391
Mean length 50.88660738
Min length 1

Characters and Unicode

Total characters 389079
Distinct characters 420
Distinct categories 21 ?
Distinct scripts 5 ?
Distinct blocks 12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 6828 ?
Unique (%) 89.3%

Sample

1st row nobody knows 🤷‍♂️
2nd row Qualsiasi penso
3rd row Bella domanda😂
4th row Ingegneria Industriale al 2 anno - ID11
5th row European and international studies, tu?

Common Values

Value Count Frequency (%)
No 43
0.1%
Yes 22
0.1%
Тийм 18
0.1%
Баярлалаа 15
< 0.1%
[user143] 14
< 0.1%
Si 13
< 0.1%
2 12
< 0.1%
Үгүй 12
< 0.1%
[user135] 11
< 0.1%
[user166] 10
< 0.1%
Other values (7059) 7476
23.9%
(Missing) 23677
75.6%

Length

2022-07-04T20:12:48.254425 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
i 1184
1.7%
a 1069
1.5%
the 900
1.3%
to 789
1.1%
and 710
1.0%
л 570
0.8%
in 534
0.8%
байна 471
0.7%
нь 440
0.6%
425
0.6%
Other values (16054) 62078
89.7%

Most occurring characters

Value Count Frequency (%)
61499
15.8%
e 23511
6.0%
a 18835
4.8%
o 18296
4.7%
i 15087
3.9%
t 15028
3.9%
а 14824
3.8%
n 14205
3.7%
s 12298
3.2%
r 11802
3.0%
Other values (410) 183694
47.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 301678
77.5%
Space Separator 61499
15.8%
Uppercase Letter 12249
3.1%
Other Punctuation 8063
2.1%
Decimal Number 2192
0.6%
Other Symbol 1453
0.4%
Close Punctuation 489
0.1%
Dash Punctuation 399
0.1%
Open Punctuation 326
0.1%
Final Punctuation 250
0.1%
Other values (11) 481
0.1%

Most frequent character per category

Other Symbol
Value Count Frequency (%)
😂 278
19.1%
😅 100
6.9%
😊 56
3.9%
🤣 56
3.9%
47
3.2%
😍 47
3.2%
27
1.9%
😁 26
1.8%
🌸 25
1.7%
🔝 22
1.5%
Other values (210) 769
52.9%
Lowercase Letter
Value Count Frequency (%)
e 23511
7.8%
a 18835
6.2%
o 18296
6.1%
i 15087
5.0%
t 15028
5.0%
а 14824
4.9%
n 14205
4.7%
s 12298
4.1%
r 11802
3.9%
э 9142
3.0%
Other values (66) 148650
49.3%
Uppercase Letter
Value Count Frequency (%)
I 1748
14.3%
A 713
5.8%
S 654
5.3%
T 515
4.2%
N 496
4.0%
Х 475
3.9%
D 423
3.5%
C 388
3.2%
P 375
3.1%
M 372
3.0%
Other values (52) 6090
49.7%
Other Punctuation
Value Count Frequency (%)
. 3189
39.6%
, 2555
31.7%
' 663
8.2%
! 539
6.7%
: 308
3.8%
? 277
3.4%
" 174
2.2%
/ 159
2.0%
& 55
0.7%
; 49
0.6%
Other values (10) 95
1.2%
Decimal Number
Value Count Frequency (%)
1 462
21.1%
2 440
20.1%
0 371
16.9%
3 207
9.4%
5 177
8.1%
4 153
7.0%
7 122
5.6%
8 95
4.3%
6 90
4.1%
9 75
3.4%
Modifier Symbol
Value Count Frequency (%)
🏻 29
74.4%
🏼 4
10.3%
🏽 2
5.1%
🏾 2
5.1%
^ 2
5.1%
Math Symbol
Value Count Frequency (%)
= 26
50.0%
+ 14
26.9%
> 6
11.5%
~ 4
7.7%
< 2
3.8%
Dash Punctuation
Value Count Frequency (%)
- 396
99.2%
2
0.5%
1
0.3%
Currency Symbol
Value Count Frequency (%)
6
50.0%
£ 5
41.7%
1
8.3%
Close Punctuation
Value Count Frequency (%)
) 386
78.9%
] 103
21.1%
Final Punctuation
Value Count Frequency (%)
228
91.2%
22
8.8%
Open Punctuation
Value Count Frequency (%)
( 223
68.4%
[ 103
31.6%
Initial Punctuation
Value Count Frequency (%)
22
75.9%
7
24.1%
Space Separator
Value Count Frequency (%)
61499
100.0%
Control
Value Count Frequency (%)
170
100.0%
Nonspacing Mark
Value Count Frequency (%)
104
100.0%
Format
Value Count Frequency (%)
37
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 29
100.0%
Other Number
Value Count Frequency (%)
² 6
100.0%
Other Letter
Value Count Frequency (%)
2
100.0%
Enclosing Mark
Value Count Frequency (%)
1
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 206592
53.1%
Cyrillic 107335
27.6%
Common 75008
19.3%
Inherited 142
< 0.1%
Hangul 2
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
61499
82.0%
. 3189
4.3%
, 2555
3.4%
' 663
0.9%
! 539
0.7%
1 462
0.6%
2 440
0.6%
- 396
0.5%
) 386
0.5%
0 371
0.5%
Other values (268) 4508
6.0%
Latin
Value Count Frequency (%)
e 23511
11.4%
a 18835
9.1%
o 18296
8.9%
i 15087
7.3%
t 15028
7.3%
n 14205
6.9%
s 12298
6.0%
r 11802
5.7%
l 8931
4.3%
d 7086
3.4%
Other values (63) 61513
29.8%
Cyrillic
Value Count Frequency (%)
а 14824
13.8%
э 9142
8.5%
г 6757
6.3%
н 6718
6.3%
х 5676
5.3%
л 5664
5.3%
д 5578
5.2%
й 5379
5.0%
о 5196
4.8%
р 4655
4.3%
Other values (55) 37746
35.2%
Inherited
Value Count Frequency (%)
104
73.2%
37
26.1%
1
0.7%
Hangul
Value Count Frequency (%)
2
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 278659
71.6%
Cyrillic 107335
27.6%
None 1662
0.4%
Emoticons 803
0.2%
Punctuation 326
0.1%
VS 104
< 0.1%
Misc Symbols 84
< 0.1%
Dingbats 65
< 0.1%
Enclosed Alphanum Sup 32
< 0.1%
Currency Symbols 6
< 0.1%
Other values (2) 3
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
61499
22.1%
e 23511
8.4%
a 18835
6.8%
o 18296
6.6%
i 15087
5.4%
t 15028
5.4%
n 14205
5.1%
s 12298
4.4%
r 11802
4.2%
l 8931
3.2%
Other values (81) 79167
28.4%
Cyrillic
Value Count Frequency (%)
а 14824
13.8%
э 9142
8.5%
г 6757
6.3%
н 6718
6.3%
х 5676
5.3%
л 5664
5.3%
д 5578
5.2%
й 5379
5.0%
о 5196
4.8%
р 4655
4.3%
Other values (55) 37746
35.2%
Emoticons
Value Count Frequency (%)
😂 278
34.6%
😅 100
12.5%
😊 56
7.0%
😍 47
5.9%
😁 26
3.2%
😭 21
2.6%
😆 20
2.5%
🙂 20
2.5%
🙃 18
2.2%
🙈 18
2.2%
Other values (40) 199
24.8%
Punctuation
Value Count Frequency (%)
228
69.9%
37
11.3%
22
6.7%
22
6.7%
7
2.1%
7
2.1%
2
0.6%
1
0.3%
None
Value Count Frequency (%)
í 199
12.0%
è 164
9.9%
á 138
8.3%
é 129
7.8%
ó 93
5.6%
à 75
4.5%
ì 69
4.2%
ù 57
3.4%
ò 57
3.4%
🤣 56
3.4%
Other values (157) 625
37.6%
VS
Value Count Frequency (%)
104
100.0%
Dingbats
Value Count Frequency (%)
47
72.3%
10
15.4%
3
4.6%
2
3.1%
2
3.1%
1
1.5%
Misc Symbols
Value Count Frequency (%)
27
32.1%
22
26.2%
9
10.7%
6
7.1%
4
4.8%
3
3.6%
2
2.4%
2
2.4%
2
2.4%
1
1.2%
Other values (6) 6
7.1%
Currency Symbols
Value Count Frequency (%)
6
100.0%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇩 5
15.6%
🇪 4
12.5%
🇰 4
12.5%
🇮 3
9.4%
🇹 3
9.4%
🇬 3
9.4%
🇧 2
6.2%
🇳 2
6.2%
🇲 2
6.2%
🇵 1
3.1%
Other values (3) 3
9.4%
Compat Jamo
Value Count Frequency (%)
2
100.0%
Enclosed Ideographic Sup
Value Count Frequency (%)
🉐 1
100.0%

transactions.attributes.transactionId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Id of transaction

Distinct 18
Distinct (%) 1.5%
Missing 30119
Missing (%) 96.2%
Infinite 0
Infinite (%) 0.0%
Mean 2.893687708
Minimum 1
Maximum 23
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:48.547975 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
median 2
Q3 4
95-th percentile 8
Maximum 23
Range 22
Interquartile range (IQR) 3

Descriptive statistics

Standard deviation 2.570436182
Coefficient of variation (CV) 0.8882908046
Kurtosis 7.828999478
Mean 2.893687708
Median Absolute Deviation (MAD) 1
Skewness 2.261642242
Sum 3484
Variance 6.607142167
Monotonicity Not monotonic
2022-07-04T20:12:48.753382 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
Value Count Frequency (%)
1 484
1.5%
2 244
0.8%
3 134
0.4%
4 100
0.3%
5 72
0.2%
6 66
0.2%
7 32
0.1%
8 24
0.1%
9 22
0.1%
10 6
< 0.1%
Other values (8) 20
0.1%
(Missing) 30119
96.2%
Value Count Frequency (%)
1 484
1.5%
2 244
0.8%
3 134
0.4%
4 100
0.3%
5 72
0.2%
6 66
0.2%
7 32
0.1%
8 24
0.1%
9 22
0.1%
10 6
< 0.1%
Value Count Frequency (%)
23 1
< 0.1%
19 1
< 0.1%
17 2
< 0.1%
15 3
< 0.1%
14 2
< 0.1%
13 3
< 0.1%
12 4
< 0.1%
11 4
< 0.1%
10 6
< 0.1%
9 22
0.1%

transactions.attributes.reason
Categorical

HIGH CORRELATION
MISSING

Reason of accepting an answer

Distinct 2
Distinct (%) 28.6%
Missing 31316
Missing (%) > 99.9%
Memory size 244.8 KiB
spam
6
abusive
1

Length

Max length 7
Median length 4
Mean length 4.428571429
Min length 4

Characters and Unicode

Total characters 31
Distinct characters 9
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1 ?
Unique (%) 14.3%

Sample

1st row spam
2nd row abusive
3rd row spam
4th row spam
5th row spam

Common Values

Value Count Frequency (%)
spam 6
< 0.1%
abusive 1
< 0.1%
(Missing) 31316
> 99.9%

Length

2022-07-04T20:12:48.983964 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:49.217125 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
spam 6
85.7%
abusive 1
14.3%

Most occurring characters

Value Count Frequency (%)
s 7
22.6%
a 7
22.6%
p 6
19.4%
m 6
19.4%
b 1
3.2%
u 1
3.2%
i 1
3.2%
v 1
3.2%
e 1
3.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 31
100.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
s 7
22.6%
a 7
22.6%
p 6
19.4%
m 6
19.4%
b 1
3.2%
u 1
3.2%
i 1
3.2%
v 1
3.2%
e 1
3.2%

Most occurring scripts

Value Count Frequency (%)
Latin 31
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
s 7
22.6%
a 7
22.6%
p 6
19.4%
m 6
19.4%
b 1
3.2%
u 1
3.2%
i 1
3.2%
v 1
3.2%
e 1
3.2%

Most occurring blocks

Value Count Frequency (%)
ASCII 31
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
s 7
22.6%
a 7
22.6%
p 6
19.4%
m 6
19.4%
b 1
3.2%
u 1
3.2%
i 1
3.2%
v 1
3.2%
e 1
3.2%

goal.name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Question (without duplicating extended questions)

Distinct 1830
Distinct (%) 98.5%
Missing 29466
Missing (%) 94.1%
Memory size 244.8 KiB
What do you study?
3
Ideas on how to earn some pounds while studying??!! :)
3
what to do
3
What are you doing today?
2
Do you use LSE Life?
2
Other values (1825)
1844

Length

Max length 274
Median length 184
Mean length 66.11362412
Min length 1

Characters and Unicode

Total characters 122773
Distinct characters 283
Distinct categories 18 ?
Distinct scripts 3 ?
Distinct blocks 9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 1806 ?
Unique (%) 97.3%

Sample

1st row When is Data Science graduation dates of the A. Y. 2021/2021?
2nd row But what kind of questions should be asked?
3rd row What do you study?
4th row Have you already received diaries to complete with i-Log?
5th row Does anyone know how long the experiment will last?

Common Values

Value Count Frequency (%)
What do you study? 3
< 0.1%
Ideas on how to earn some pounds while studying??!! :) 3
< 0.1%
what to do 3
< 0.1%
What are you doing today? 2
< 0.1%
Do you use LSE Life? 2
< 0.1%
What is being done 2
< 0.1%
Thoughts on mask restrictions? 2
< 0.1%
Do you see my name? 2
< 0.1%
If you express your current psychology in an emoji? 2
< 0.1%
What's your favorite movie? 2
< 0.1%
Other values (1820) 1834
5.9%
(Missing) 29466
94.1%

Length

2022-07-04T20:12:49.487014 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
you 1056
4.6%
the 887
3.9%
to 750
3.3%
do 537
2.3%
a 466
2.0%
what 442
1.9%
i 383
1.7%
is 369
1.6%
of 355
1.5%
in 346
1.5%
Other values (3629) 17415
75.7%

Most occurring characters

Value Count Frequency (%)
21187
17.3%
e 10825
8.8%
o 9423
7.7%
t 8259
6.7%
a 7384
6.0%
n 6515
5.3%
i 6013
4.9%
s 5576
4.5%
r 5400
4.4%
h 4566
3.7%
Other values (273) 37625
30.6%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 92816
75.6%
Space Separator 21187
17.3%
Uppercase Letter 3804
3.1%
Other Punctuation 3478
2.8%
Other Symbol 544
0.4%
Decimal Number 438
0.4%
Close Punctuation 162
0.1%
Open Punctuation 121
0.1%
Dash Punctuation 118
0.1%
Final Punctuation 40
< 0.1%
Other values (8) 65
0.1%

Most frequent character per category

Other Symbol
Value Count Frequency (%)
😂 104
19.1%
😊 16
2.9%
😅 15
2.8%
14
2.6%
🤔 13
2.4%
🙂 11
2.0%
🌸 11
2.0%
🔝 10
1.8%
😍 10
1.8%
🙈 9
1.7%
Other values (165) 331
60.8%
Lowercase Letter
Value Count Frequency (%)
e 10825
11.7%
o 9423
10.2%
t 8259
8.9%
a 7384
8.0%
n 6515
7.0%
i 6013
6.5%
s 5576
6.0%
r 5400
5.8%
h 4566
4.9%
u 3691
4.0%
Other values (24) 25164
27.1%
Uppercase Letter
Value Count Frequency (%)
I 782
20.6%
W 585
15.4%
H 384
10.1%
D 326
8.6%
A 324
8.5%
T 204
5.4%
S 140
3.7%
C 129
3.4%
F 101
2.7%
B 96
2.5%
Other values (16) 733
19.3%
Other Punctuation
Value Count Frequency (%)
? 1669
48.0%
, 564
16.2%
. 489
14.1%
' 398
11.4%
: 91
2.6%
/ 72
2.1%
! 71
2.0%
" 61
1.8%
@ 21
0.6%
# 12
0.3%
Other values (6) 30
0.9%
Decimal Number
Value Count Frequency (%)
1 109
24.9%
2 95
21.7%
0 86
19.6%
8 31
7.1%
4 28
6.4%
5 24
5.5%
3 23
5.3%
9 15
3.4%
7 14
3.2%
6 13
3.0%
Math Symbol
Value Count Frequency (%)
> 3
33.3%
+ 3
33.3%
1
11.1%
~ 1
11.1%
= 1
11.1%
Close Punctuation
Value Count Frequency (%)
) 132
81.5%
] 30
18.5%
Dash Punctuation
Value Count Frequency (%)
- 114
96.6%
4
3.4%
Open Punctuation
Value Count Frequency (%)
( 91
75.2%
[ 30
24.8%
Final Punctuation
Value Count Frequency (%)
34
85.0%
6
15.0%
Modifier Symbol
Value Count Frequency (%)
🏻 8
72.7%
🏼 3
27.3%
Space Separator
Value Count Frequency (%)
21187
100.0%
Nonspacing Mark
Value Count Frequency (%)
29
100.0%
Format
Value Count Frequency (%)
6
100.0%
Initial Punctuation
Value Count Frequency (%)
4
100.0%
Other Number
Value Count Frequency (%)
² 3
100.0%
Currency Symbol
Value Count Frequency (%)
£ 2
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 1
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 96620
78.7%
Common 26118
21.3%
Inherited 35
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
21187
81.1%
? 1669
6.4%
, 564
2.2%
. 489
1.9%
' 398
1.5%
) 132
0.5%
- 114
0.4%
1 109
0.4%
😂 104
0.4%
2 95
0.4%
Other values (211) 1257
4.8%
Latin
Value Count Frequency (%)
e 10825
11.2%
o 9423
9.8%
t 8259
8.5%
a 7384
7.6%
n 6515
6.7%
i 6013
6.2%
s 5576
5.8%
r 5400
5.6%
h 4566
4.7%
u 3691
3.8%
Other values (50) 28968
30.0%
Inherited
Value Count Frequency (%)
29
82.9%
6
17.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 122112
99.5%
None 258
0.2%
Emoticons 253
0.2%
Punctuation 54
< 0.1%
Misc Symbols 43
< 0.1%
VS 29
< 0.1%
Dingbats 19
< 0.1%
Enclosed Alphanum Sup 4
< 0.1%
Geometric Shapes 1
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
21187
17.4%
e 10825
8.9%
o 9423
7.7%
t 8259
6.8%
a 7384
6.0%
n 6515
5.3%
i 6013
4.9%
s 5576
4.6%
r 5400
4.4%
h 4566
3.7%
Other values (77) 36964
30.3%
Emoticons
Value Count Frequency (%)
😂 104
41.1%
😊 16
6.3%
😅 15
5.9%
🙂 11
4.3%
😍 10
4.0%
🙈 9
3.6%
😉 8
3.2%
😭 6
2.4%
😆 6
2.4%
😌 6
2.4%
Other values (26) 62
24.5%
Punctuation
Value Count Frequency (%)
34
63.0%
6
11.1%
6
11.1%
4
7.4%
4
7.4%
VS
Value Count Frequency (%)
29
100.0%
Misc Symbols
Value Count Frequency (%)
14
32.6%
7
16.3%
5
11.6%
5
11.6%
4
9.3%
3
7.0%
1
2.3%
1
2.3%
1
2.3%
1
2.3%
None
Value Count Frequency (%)
🤔 13
5.0%
🌸 11
4.3%
🔝 10
3.9%
🤣 9
3.5%
👰 8
3.1%
🏻 8
3.1%
🤗 8
3.1%
💎 7
2.7%
é 5
1.9%
🌞 4
1.6%
Other values (122) 175
67.8%
Dingbats
Value Count Frequency (%)
8
42.1%
7
36.8%
1
5.3%
1
5.3%
1
5.3%
1
5.3%
Geometric Shapes
Value Count Frequency (%)
1
100.0%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇸 1
25.0%
🇨 1
25.0%
🇳 1
25.0%
🇲 1
25.0%

goal.description
Unsupported

MISSING
REJECTED
UNSUPPORTED

Empty column

Missing 31323
Missing (%) 100.0%
Memory size 244.8 KiB

attributes.kindOfAnswerer
Categorical

MISSING

-

Distinct 3
Distinct (%) 0.2%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
ask_to_anyone
1414
ask_to_similar
268
ask_to_different
146

Length

Max length 16
Median length 13
Mean length 13.38621444
Min length 13

Characters and Unicode

Total characters 24470
Distinct characters 15
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ask_to_different
2nd row ask_to_anyone
3rd row ask_to_anyone
4th row ask_to_anyone
5th row ask_to_anyone

Common Values

Value Count Frequency (%)
ask_to_anyone 1414
4.5%
ask_to_similar 268
0.9%
ask_to_different 146
0.5%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:49.780370 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:12:50.246220 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
ask_to_anyone 1414
77.4%
ask_to_similar 268
14.7%
ask_to_different 146
8.0%

Most occurring characters

Value Count Frequency (%)
_ 3656
14.9%
a 3510
14.3%
o 3242
13.2%
n 2974
12.2%
s 2096
8.6%
t 1974
8.1%
k 1828
7.5%
e 1706
7.0%
y 1414
5.8%
i 682
2.8%
Other values (5) 1388
5.7%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 20814
85.1%
Connector Punctuation 3656
14.9%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
a 3510
16.9%
o 3242
15.6%
n 2974
14.3%
s 2096
10.1%
t 1974
9.5%
k 1828
8.8%
e 1706
8.2%
y 1414
6.8%
i 682
3.3%
r 414
2.0%
Other values (4) 974
4.7%
Connector Punctuation
Value Count Frequency (%)
_ 3656
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 20814
85.1%
Common 3656
14.9%

Most frequent character per script

Latin
Value Count Frequency (%)
a 3510
16.9%
o 3242
15.6%
n 2974
14.3%
s 2096
10.1%
t 1974
9.5%
k 1828
8.8%
e 1706
8.2%
y 1414
6.8%
i 682
3.3%
r 414
2.0%
Other values (4) 974
4.7%
Common
Value Count Frequency (%)
_ 3656
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 24470
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
_ 3656
14.9%
a 3510
14.3%
o 3242
13.2%
n 2974
12.2%
s 2096
8.6%
t 1974
8.1%
k 1828
7.5%
e 1706
7.0%
y 1414
5.8%
i 682
2.8%
Other values (5) 1388
5.7%

attributes.answeredDetails
Categorical

HIGH CARDINALITY
MISSING

-

Distinct 1064
Distinct (%) 58.2%
Missing 29495
Missing (%) 94.2%
Memory size 244.8 KiB
Үгүй
51
.
45
No
43
Interesse comune
38
Masters
37
Other values (1059)
1614

Length

Max length 287
Median length 120
Mean length 22.31509847
Min length 1

Characters and Unicode

Total characters 40792
Distinct characters 169
Distinct categories 14 ?
Distinct scripts 4 ?
Distinct blocks 8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 930 ?
Unique (%) 50.9%

Sample

1st row Because it is interesting to deal with people different than me.
2nd row A tutti perché non capisco che tipo di domande dovrei fare.. booog
3rd row Mi sembrava (in un contesto universitario) un buon modo per rompere il ghiaccio
4th row Domanda rivolta a tutti i partecipanti all'indagine
5th row Perché si tratta di informazioni che riguardano tutti i partecipanti all'esperimento

Common Values

Value Count Frequency (%)
Үгүй 51
0.2%
. 45
0.1%
No 43
0.1%
Interesse comune 38
0.1%
Masters 37
0.1%
Хэн ч болно 28
0.1%
хэн ч болно 27
0.1%
Илүү олон хүнээс асуумаар байна 25
0.1%
Аль болох олон 23
0.1%
зүгээр л 23
0.1%
Other values (1054) 1488
4.8%
(Missing) 29495
94.2%

Length

2022-07-04T20:12:50.529161 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
to 204
2.8%
i 168
2.3%
a 136
1.8%
the 101
1.4%
question 87
1.2%
of 86
1.2%
general 79
1.1%
ч 72
1.0%
want 70
0.9%
болно 69
0.9%
Other values (1562) 6297
85.5%

Most occurring characters

Value Count Frequency (%)
5551
13.6%
e 3880
9.5%
o 2478
6.1%
a 2246
5.5%
n 2227
5.5%
s 2176
5.3%
t 2143
5.3%
i 2024
5.0%
r 1883
4.6%
u 1020
2.5%
Other values (159) 15164
37.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 33053
81.0%
Space Separator 5551
13.6%
Uppercase Letter 1724
4.2%
Other Punctuation 331
0.8%
Other Symbol 46
0.1%
Final Punctuation 40
0.1%
Decimal Number 19
< 0.1%
Close Punctuation 9
< 0.1%
Dash Punctuation 7
< 0.1%
Format 4
< 0.1%
Other values (4) 8
< 0.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 3880
11.7%
o 2478
7.5%
a 2246
6.8%
n 2227
6.7%
s 2176
6.6%
t 2143
6.5%
i 2024
6.1%
r 1883
5.7%
u 1020
3.1%
l 910
2.8%
Other values (59) 12066
36.5%
Uppercase Letter
Value Count Frequency (%)
I 300
17.4%
P 142
8.2%
Х 110
6.4%
C 83
4.8%
A 81
4.7%
N 80
4.6%
M 72
4.2%
B 65
3.8%
S 63
3.7%
Ү 61
3.5%
Other values (35) 667
38.7%
Other Symbol
Value Count Frequency (%)
😂 5
10.9%
🤷 5
10.9%
4
8.7%
👍 3
6.5%
😁 2
4.3%
2
4.3%
😍 2
4.3%
🤫 2
4.3%
😜 2
4.3%
😃 1
2.2%
Other values (18) 18
39.1%
Other Punctuation
Value Count Frequency (%)
. 156
47.1%
' 78
23.6%
, 43
13.0%
" 18
5.4%
! 13
3.9%
/ 9
2.7%
? 8
2.4%
: 5
1.5%
; 1
0.3%
Decimal Number
Value Count Frequency (%)
1 7
36.8%
8 5
26.3%
4 3
15.8%
2 2
10.5%
3 1
5.3%
7 1
5.3%
Final Punctuation
Value Count Frequency (%)
39
97.5%
1
2.5%
Close Punctuation
Value Count Frequency (%)
) 8
88.9%
] 1
11.1%
Open Punctuation
Value Count Frequency (%)
( 3
75.0%
[ 1
25.0%
Space Separator
Value Count Frequency (%)
5551
100.0%
Dash Punctuation
Value Count Frequency (%)
- 7
100.0%
Format
Value Count Frequency (%)
4
100.0%
Nonspacing Mark
Value Count Frequency (%)
2
100.0%
Initial Punctuation
Value Count Frequency (%)
1
100.0%
Control
Value Count Frequency (%)
1
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 28631
70.2%
Cyrillic 6146
15.1%
Common 6009
14.7%
Inherited 6
< 0.1%

Most frequent character per script

Latin
Value Count Frequency (%)
e 3880
13.6%
o 2478
8.7%
a 2246
7.8%
n 2227
7.8%
s 2176
7.6%
t 2143
7.5%
i 2024
7.1%
r 1883
6.6%
u 1020
3.6%
l 910
3.2%
Other values (51) 7644
26.7%
Common
Value Count Frequency (%)
5551
92.4%
. 156
2.6%
' 78
1.3%
, 43
0.7%
39
0.6%
" 18
0.3%
! 13
0.2%
/ 9
0.1%
) 8
0.1%
? 8
0.1%
Other values (43) 86
1.4%
Cyrillic
Value Count Frequency (%)
а 592
9.6%
э 560
9.1%
о 481
7.8%
н 463
7.5%
л 425
6.9%
ү 360
5.9%
г 321
5.2%
х 305
5.0%
й 270
4.4%
р 260
4.2%
Other values (43) 2109
34.3%
Inherited
Value Count Frequency (%)
4
66.7%
2
33.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 34369
84.3%
Cyrillic 6146
15.1%
None 202
0.5%
Punctuation 45
0.1%
Emoticons 21
0.1%
Misc Symbols 6
< 0.1%
VS 2
< 0.1%
Dingbats 1
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
5551
16.2%
e 3880
11.3%
o 2478
7.2%
a 2246
6.5%
n 2227
6.5%
s 2176
6.3%
t 2143
6.2%
i 2024
5.9%
r 1883
5.5%
u 1020
3.0%
Other values (62) 8741
25.4%
Cyrillic
Value Count Frequency (%)
а 592
9.6%
э 560
9.1%
о 481
7.8%
н 463
7.5%
л 425
6.9%
ү 360
5.9%
г 321
5.2%
х 305
5.0%
й 270
4.4%
р 260
4.2%
Other values (43) 2109
34.3%
None
Value Count Frequency (%)
é 46
22.8%
à 28
13.9%
è 26
12.9%
ó 23
11.4%
í 18
8.9%
ù 15
7.4%
á 9
4.5%
ò 8
4.0%
ì 6
3.0%
🤷 5
2.5%
Other values (12) 18
8.9%
Punctuation
Value Count Frequency (%)
39
86.7%
4
8.9%
1
2.2%
1
2.2%
Emoticons
Value Count Frequency (%)
😂 5
23.8%
😁 2
9.5%
😍 2
9.5%
😜 2
9.5%
😃 1
4.8%
😒 1
4.8%
🙊 1
4.8%
😉 1
4.8%
😣 1
4.8%
😘 1
4.8%
Other values (4) 4
19.0%
Misc Symbols
Value Count Frequency (%)
4
66.7%
2
33.3%
VS
Value Count Frequency (%)
2
100.0%
Dingbats
Value Count Frequency (%)
1
100.0%

transactions.count.id
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Count of follow-up action on task (Higher the number, the more actions were done on task)

Distinct 25
Distinct (%) 0.2%
Missing 20745
Missing (%) 66.2%
Infinite 0
Infinite (%) 0.0%
Mean 4.024768387
Minimum 0
Maximum 24
Zeros 1456
Zeros (%) 4.6%
Negative 0
Negative (%) 0.0%
Memory size 244.8 KiB
2022-07-04T20:12:50.796939 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
median 3
Q3 6
95-th percentile 11
Maximum 24
Range 24
Interquartile range (IQR) 5

Descriptive statistics

Standard deviation 3.463494445
Coefficient of variation (CV) 0.860545033
Kurtosis 1.002744266
Mean 4.024768387
Median Absolute Deviation (MAD) 2
Skewness 1.068445946
Sum 42574
Variance 11.99579377
Monotonicity Not monotonic
2022-07-04T20:12:51.038813 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
Value Count Frequency (%)
0 1456
4.6%
1 1454
4.6%
2 1451
4.6%
3 1268
4.0%
4 1059
3.4%
5 884
2.8%
6 735
2.3%
7 586
1.9%
8 449
1.4%
9 360
1.1%
Other values (15) 876
2.8%
(Missing) 20745
66.2%
Value Count Frequency (%)
0 1456
4.6%
1 1454
4.6%
2 1451
4.6%
3 1268
4.0%
4 1059
3.4%
5 884
2.8%
6 735
2.3%
7 586
1.9%
8 449
1.4%
9 360
1.1%
Value Count Frequency (%)
24 1
< 0.1%
23 1
< 0.1%
22 1
< 0.1%
21 1
< 0.1%
20 2
< 0.1%
19 2
< 0.1%
18 5
< 0.1%
17 16
0.1%
16 24
0.1%
15 40
0.1%

Interactions

2022-07-04T20:12:29.684178 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:19.390148 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:21.276679 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:22.993128 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:24.564180 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:26.281018 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:28.125121 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:29.933677 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:19.669692 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:21.526782 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:23.212401 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:24.819861 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:26.485244 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:28.314899 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:30.159588 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:19.916920 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:21.780797 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:23.445555 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:25.074977 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:26.728982 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:28.548634 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:30.351239 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:20.137073 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:22.017972 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:23.663217 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:25.316104 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:26.958997 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:28.781615 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:30.575443 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:20.396097 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:22.277676 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:23.905276 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:25.573076 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:27.215222 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:28.976992 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:30.804143 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:20.599108 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:22.523986 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:24.138334 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:25.821268 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:27.450942 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:29.209059 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:31.058072 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:20.798907 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:22.766015 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:24.373484 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:26.039049 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:27.687305 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:12:29.440656 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-04T20:12:51.259387 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient ( ρ ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r . It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y , one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-04T20:12:51.661015 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient ( r ) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r .

To calculate r for two variables X and Y , one divides the covariance of X and Y by the product of their standard deviations.
2022-07-04T20:12:52.055797 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient ( τ ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y , one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-04T20:12:52.469193 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here .

Missing values

2022-07-04T20:12:31.641149 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-04T20:12:34.207263 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-04T20:12:36.151841 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-07-04T20:12:38.110794 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.